an image

GPU poor as a state of mind

Being rich is not the same as being wealthy. Having a lot of money means you're rich. Owning assets that make you money while you sleep means you're wealthy.

Being broke is not the same as being poor. Having no money means you're broke. Making choices that keep you broke means you're poor.

The GPU poor

The GPU poor complain that they don't have as much compute power as big players. Words of support for the GPU poor usually go like this,

This is basically frugality and asceticism for compute-broke people. It's "how to get by" living paycheck-to-paycheck prompt-to-prompt.

Escaping compute poverty

Compute engines consume time and electricity and produce compute.

Most of the time, your compute engines are idle. You use them on-demand and in real-time.

You want more compute. What is to be done?

Ten years ago, you might be advised to mine Bitcoin. Not anymore. Now that professionals are in the game, revenue won't cover the energy/maintenance costs in an amateur rig.

Today, you might be told to use something like vast to sell compute to the grid. Your idle time costs you nothing, so you set a price that covers energy and maintenance.

Both of these proposals are fundamentally flawed. They don't solve the root of the problem.

They both use your compute engine to trade time and electricity for money. But you didn't want more money. You wanted more compute. And the difference is important.

Money vs Compute

Transformers like chat bots produce one token at a time. The algorithm to generate one token is parallelizable (hence GPUs), but generating a sequence of tokens is a linear process.

It's not just transformers that are linear in this way. Image generation iterates over a linear sequence of denoising steps. Video generation happens a frame at a time.1

Obviously more money buys you more compute engines and more electricity, which to a certain extent, buys you time. But you can only reduce the linearity of the problem, you can't remove it.

For real-time tasks, I think this is the best we can do. Chat bots chat with a user. Image gen workflows depend on the user directing the process between each batch of generated images. The best advice for real time tasks is just, "spend more money on compute engines and electricity" (along with the typical frugality/efficiency advice).

Compute Wealthy

But what about those inherently linear tasks that we don't need done in real-time? This is where us compute-broke commoners can truly become compute-wealthy.

Here are two ideas of the types things I'm talking about, both of which I want. I'm actively working on the second one.

    1    I'm working on a programming project. At the end of the day, I write a TODO list. While I sleep, agentic workflows iteratively generate, critique, and test patches. When I wake up, I review the patches, merge, and deploy. Then, I write another TODO list.

    2    I'm using a deeply personalized learning platform. It uses a spaced-repetition system to help me track and efficiently review the concepts I want to learn. Without the platform, I would find the material rather dry. But each night while I sleep, the system generates a series of media-rich (audio, video, images) exercises, specific to my interests and the things I need to learn or review. When it's time to learn, I "play the game" it generated for me, it tracks my progress (and my level of engagement), and uses this updated state to generate tomorrow's tasks.

Footnotes

1

"Video generation happens a frame at a time." Maybe you come up with a clever parallelization hack, like using grid of low-res pre-rendered video frames as control vectors to an image generator, using overlapping frames for style matching, and then up-sampling the frames. In that case, you're not going one frame at a time. But there is still plenty of fundamentally non-parallelizable steps to perform along the way.