Teng Yan
December 3, 2024

Our Crypto AI Thesis (Part II): Decentralised Compute is King

Decentralised compute is the backbone of Crypto AI — GPU marketplaces, training & inference.

I haven’t shaken this one big miss.

It still haunts me because it was the most obvious bet for anyone paying attention, yet I didn’t invest a single dollar.

No, it wasn’t the next Solana killer or a memecoin with a dog wearing a funny hat.

It was… NVIDIA.

NVDA share price year-to-date. Source: Google

In just one year, NVDA 3x’d, soaring from a $1T to a $3T market cap. It even outperformed Bitcoin over the same period.

Sure, some of that is AI hype. But a huge part of it is grounded in reality. NVIDIA reported $60B in revenue for FY2024, a staggering 126% increase from 2023. This growth was driven by Big Tech snapping up GPUs in a global AI arms race to AGI.

So why did I miss it?

For two years, I was laser-focused on crypto and didn’t look outside to what was happening in AI. That was a big mistake, and it still eats at me.

But I’m not making the same mistake twice.

Today, Crypto AI feels eerily similar. We’re on the brink of an innovation explosion. The parallels to the California Gold Rush of the mid-1800s are hard to ignore—industries and cities sprang up overnight, infrastructure advanced at breakneck speed, and fortunes were made by those who dared to leap.

Like NVIDIA in its early days, Crypto AI will feel obvious in hindsight.

In Part I of my thesis, I explained why Crypto AI is today's most exciting underdog opportunity for investors and builders.

Here’s a quick recap:

  • Many still dismiss it as “vaporware.”
  • Crypto AI is in its early cycle—likely 1-2 years away from peak hype.
  • There’s a $230B+ growth opportunity in this space, at minimum.

At its core, Crypto AI is AI with crypto infrastructure layered on top. This means it’s more likely to track AI’s exponential growth trajectory than the broader crypto market. So, to stay ahead, you’ve got to tune into the latest AI research on Arxiv and talk to founders who believe they’re building the next big thing.

In Part II of my thesis, I’ll dive into four of the most promising subsectors in Crypto AI:

  1. Decentralised Compute: Training, Inference & GPU marketplaces
  2. Data networks
  3. Verifiable AI
  4. AI Agents living on-chain

This piece represents the culmination of weeks of deep research and conversations with founders and teams across the Crypto AI landscape. It’s not designed to be an exhaustive deep dive into every sector—that’s a rabbit hole for another day.

Instead, consider it a high-level roadmap crafted to spark curiosity, sharpen your research, and guide investment thinking.

Mapping out the landscape

I picture the decentralised AI stack as a layered ecosystem: it starts with decentralised compute and open data networks on one end, which power decentralised AI model training.

Every inference is then verified—inputs and outputs alike—using a combination of cryptography, cryptoeconomic incentives, and evaluation networks. These verified outputs flow into AI agents that can operate autonomously on-chain, as well as consumer and enterprise AI applications that users can actually trust.

Coordination networks tie it all together, enabling seamless communication and collaboration across the ecosystem.

In this vision, anyone building in AI could tap into one or more layers of this stack, depending on their specific needs. Whether leveraging decentralised compute for model training or using evaluation networks to ensure high-quality outputs, the stack offers a range of options.

Thanks to blockchain’s inherent composability, I believe we are naturally moving toward a modular future. Each layer is becoming hyper-specialized, with protocols optimized for distinct functions rather than an all-in-one integrated approach.

Source: topology.vc

There’s been a Cambrian explosion of startups building across every layer of the decentralised AI stack, most founded in just the last 1 - 3 years. It’s clear: we’re still early.

The most comprehensive and up-to-date map of the Crypto AI startup landscape I’ve seen is maintained by Casey and her team over at topology.vc. It’s an invaluable resource for anyone tracking the space.

As I dive into the Crypto AI subsectors, I’m constantly asking myself: how big is the opportunity here? I’m not interested in small bets—I’m looking for markets that can scale into hundreds of billions.

1. Market Size

Let’s start with the market size. When evaluating a subsector, I ask myself: is it creating a brand-new market or disrupting an existing one?

Take decentralised compute, for instance. It’s a disruptive category whose potential can be estimated by looking at the established cloud computing market, worth ~$680B today and expected to reach $2.5T in 2032.

New markets with no precedents, like AI agents, are tougher to quantify. Without historical data, sizing them up involves a mix of educated guesses and gut checks on the problems they’re solving. And the pitfall is that sometimes, what looks like a new market is really just a solution looking for a problem.

2. Timing

Timing is everything. Technology tends to improve and become cheaper over time, but the pace of progress varies.

How mature is the technology in a given subsector? Is it ready to scale, or is it still in the research phase, with practical applications years away? Timing determines whether a sector deserves immediate attention or should be left in the “wait and see” category.

Take Fully Homomorphic Encryption (FHE) as an example: the potential is undeniable, but today it’s still too slow for widespread use. We’re likely several years out from seeing it hit mainstream viability. By focusing on sectors closer to scaling first, I can spend my time and energy where the momentum—and opportunity—is building.

If I were to map these categories on a size vs. timing chart, it would look something like this. Keep in mind that this is more of a conceptual sketch than a hard-and-fast guide. There’s a lot of nuances—for example, within verifiable inference, different approaches like zkML and opML are at different readiness levels for use.

That said, I am convinced that AI’s scale will be so massive that even what looks “niche” today could evolve into a significant market.

It’s also worth noting that technological progress doesn’t always follow a straight line—it often happens in leaps. My views on timing and market size will shift when emergent breakthroughs occur.

With this framework in mind, let’s break down each sub-sector.

Sector 1: Decentralised compute

TL;dr

  • Decentralised compute is the backbone of decentralised AI.
  • GPU marketplaces, decentralised training and decentralised inference are deeply interconnected and thrive together.
  • The supply side usually comes from small-mid tier data centres and consumer GPUs.
  • The demand side is small but growing. Today it comes from price-sensitive, latency-insensitive users and smaller AI startups.
  • The biggest challenge for Web3 GPU marketplaces today is actually making them work.
  • Orchestrating GPUs across a decentralised network requires advanced engineering and a well-designed, robust network architecture.

1.1. GPU Marketplaces / Compute Networks

Several Crypto AI teams are positioning themselves to capitalize on the shortage of GPUs relative to demand by building decentralised networks that tap into the global pool of latent compute power.

The core value proposition for GPU marketplaces is 3-fold:

  1. You can access compute at “up to 90% cheaper” than AWS, which comes from (1) removing middlemen and (2) opening up the supply side. Essentially, these marketplaces allow you to tap into the lowest marginal cost of compute globally.
  2. Greater flexibility: No lock-in contracts, no KYC, no waiting times.
  3. Censorship-resistance

To tackle the supply side of the market, these marketplaces source compute from:

  • Enterprise-grade GPUs (e.g. A100s, H100s) from small-mid tier data centres struggling to find demand on their own or Bitcoin miners looking to diversify. I also know of teams tapping into large government-funded infrastructure projects, where data centres have been built as part of technology growth initiatives. These providers are often incentivized to keep their GPUs on the network, which helps them offset the amortization costs of their GPUs.
  • Consumer-grade GPUs from the millions of gamers and home users who connect their computers to the network in exchange for token incentives

On the other hand, the demand side for decentralised compute today comes from:

  1. Price-sensitive, latency-insensitive users. This segment prioritizes affordability over speed. Think researchers exploring new fields, indie AI developers, and other cost-conscious users who don’t need real-time processing. Due to budget constraints, many of them may struggle with traditional hyperscalers like AWS or Azure. Because they are quite distributed across the population, targeted marketing is crucial to bring this group on board.
  2. Smaller AI startups face challenges securing flexible, scalable compute resources without locking into long-term contracts with major cloud providers. Business development is vital in attracting this segment, as they’re actively seeking alternatives to hyperscaler lock-in.
  3. Crypto AI startups building decentralised AI products but without their own supply of compute will need to tap into the resources of one of these networks.
  4. Cloud gaming: While not directly AI-driven, cloud gaming is a rising source of demand for GPU resources.

The key thing to remember: developers always prioritise costs and reliability.

The Real Challenge: Demand, Not Supply

Startups in this space often tout the size of their GPU supply networks as a sign of success. But this is misleading—it is a vanity metric at best.

The real constraint is not supply but demand. The key metrics to track aren’t the number of GPUs available, but rather the utilization rate and the number of GPUs actually rented out.

Tokens are excellent at bootstrapping the supply side, creating the incentives necessary to scale up quickly. However, they don’t inherently solve the demand problem. The real test is getting the product to a good enough state where latent demand materializes.

Haseeb Qureshi (Dragonfly) puts best:

Having a token does not magically bootstrap network effects. This was the old mantra in crypto and some people seem to still believe it.

At best, a token can help bootstrap the supply side of a network. But the demand side comes from a great product & GTM, not from a token.

— Haseeb >|< (@hosseeb)
5:46 PM • Sep 9, 2024

Making Compute Networks Actually Work

Contrary to popular belief, the biggest hurdle for web3 distributed GPU marketplaces today is simply getting them to work properly.

This isn’t a trivial problem.

Orchestrating GPUs across a distributed network is complex, with layers of challenges—resource allocation, dynamic workload scaling, load balancing across nodes and GPUs, latency management, data transfer, fault tolerance, and handling diverse hardware scattered across various geographies. I could go on and on.

Achieving this requires serious engineering and a robust, properly designed network architecture.

To put it in perspective, consider Google’s Kubernetes. It’s widely regarded as the gold standard for container orchestration, automating processes like load balancing and scaling in distributed environments—very similar challenges to those faced by distributed GPU networks. Kubernetes itself was built on over a decade of Google’s experience, and even then, it took years of relentless iteration to get right.

Some of the GPU compute marketplaces that are already live today can handle small-scale workloads, but the cracks start to show as soon as they try to scale. I suspect this is because they were built on poorly designed architectural foundations.

Another challenge/opportunity for decentralised compute networks is ensuring trustworthiness: verifying that each node is actually providing the compute power it claims. Currently, this relies on the network's reputation, and in some cases, compute providers are ranked by reputation scores. Blockchain seems to be a natural fit for trustless verification systems. Startups like Gensyn and Spheron are pushing for a trustless approach to solving this issue.

Today, many web3 teams are still navigating these challenges, meaning the opportunity is wide open.

Decentralised Compute Market Size

How big is the market for decentralised compute networks?

Today, it’s probably just a tiny fraction of the $680B - $2.5T cloud computing industry. Yet, despite the added friction for users, there will always be some demand as long as costs stay lower than those of traditional providers.

I believe costs will remain lower in the near-to-mid term due to a mix of token subsidies and the unlocking of supply from users who aren’t price-sensitive (for example, if I can rent out my gaming laptop for extra cash, I’m happy whether it’s $20 or $50 a month).

But the true growth potential for decentralised compute networks—and the real expansion of their TAM—will come when:

  1. Decentralised training of AI models becomes practical
  2. Demand for inference explodes and existing data centres are not able to meet it. This is already starting to play out. Jensen Huang says that inference demand is going to increase “a billion times”.
  3. Proper Service-Level Agreements (SLAs) become available, addressing a critical barrier to enterprise adoption. Currently, decentralised compute operates on a best-effort basis, leaving users with varying levels of service quality (e.g. % uptime). With SLAs in place, these networks could offer standardized reliability and performance metrics, making decentralised compute a viable alternative to traditional cloud compute providers.

Decentralised, permissionless compute stands as the base layer—the foundational infrastructure—for a decentralised AI ecosystem.

Despite the ongoing expansion in the supply chain for silicon (i.e. GPUs), I believe we’re only at the dawn of humanity’s Intelligence era. There will be an insatiable demand for compute.

Watch for the inflection point that could trigger a major re-rating of all working GPU marketplaces. It’s probably coming soon.

Other Notes:

  • The pure-play GPU marketplace is crowded, with competition among decentralised platforms and also the rise of web2 AI neoclouds like Vast.ai and Lambda.
  • Small nodes (e.g., 4 x H100) are not in much demand because of their limited use, but good luck finding anyone selling large clusters—they’re still in serious demand.
  • Will a dominant player aggregate all the compute supply for decentralised protocols, or will it remain fragmented among multiple marketplaces? I’m leaning towards the former and a power law distribution in outcomes, as consolidation often drives efficiency in infrastructure. But it will take time to play out, and meanwhile, fragmentation and messiness continue.
  • Developers want to focus on building applications, not dealing with deployment and configuration. Marketplaces must abstract away these complexities, making access to compute as frictionless as possible.

1.2. Decentralised Training

TL;dr

  • If scaling laws hold, training the next generation of frontier AI models in a single data centre will one day become impossible, physically.
  • Training AI models requires a lot of data transfer between GPUs. Low data transfer (interconnect) speed between distributed GPUs is often the biggest barrier.
  • Researchers are exploring multiple approaches simultaneously, and breakthroughs are happening (e.g. Open DiLoCo, DisTrO). These advances will stack and compound, accelerating progress in the space.
  • The future for decentralised training likely lies in smaller, specialized models designed for niche applications rather than frontier, AGI-focused models.
  • Inference demand is poised to skyrocket with the shift towards models like OpenAI’s o1, creating opportunities for decentralised inference networks.

Picture this: a massive, world-changing AI model, not developed in secretive elite labs but brought to life by millions of everyday people. Gamers, whose GPUs typically churn out Call of Duty cinematic explosions, now lend their hardware to something grander—an open-source, collectively-owned AI model with no central gatekeepers.

In this future, foundation-scale models aren’t just the domain of the top AI labs.

But let’s ground this vision in today’s reality. For now, the lion’s share of heavyweight AI training remains anchored in centralized data centres, and this will likely be the norm for some time.

Companies like OpenAI are scaling up their massive clusters. Elon Musk recently announced that xAI is nearing the completion of a data centre boasting the equivalent of 200,000 H100 GPUs.

But it’s not only about the raw GPU count. Model FLOPS utilization (MFU)—a metric introduced in Google’s PaLM paper in 2022—tracks how effectively a GPU’s maximum capacity is used. Surprisingly, MFU often hovers around 35-40%.

Why so low? While GPU performance has skyrocketed over the years following Moore’s law, network, memory, and storage improvements have lagged behind significantly, creating bottlenecks. As a result, GPUs frequently sit idle, waiting for data.

AI training remains highly centralized today because of one word — Efficiency. 

Training large models depends on techniques like:

• Data parallelism: Splitting datasets across multiple GPUs to perform operations in parallel, accelerating the training process.

• Model parallelism: Distributing parts of the model across GPUs to bypass memory constraints.

These methods require GPUs to exchange data constantly, making interconnect speed—the rate at which data is transferred across computers in the network—absolutely essential.

When frontier AI model training can cost upwards of $1B, every efficiency gain matters.

With their high-speed interconnects centralised data centres enable rapid data transfer between GPUs and create substantial cost savings during training time that decentralised setups can’t match…yet.

Overcoming Slow Interconnect Speed

If you talk with people working in the AI space, many will tell you that decentralised training just won’t work.

In decentralised setups, GPU clusters aren’t physically co-located, so transferring data between them is much slower and becomes a bottleneck. Training requires GPUs to sync and exchange data at each step. The farther apart they are, the higher the latency. Higher latency means slower training speed and higher costs.

What might take a few days in a centralized data centre could stretch to two weeks with a decentralised approach at a higher cost. That’s simply not viable.

But this is set to change.

The good news is that there’s been a massive surge of interest in research around distributed training. Researchers are exploring multiple approaches simultaneously, as evidenced by the flurry of studies and published papers. These advances will stack and compound, accelerating progress in the space.

It’s also about testing in production and seeing how far we can push boundaries.

Some decentralised training techniques can already handle smaller models in slow interconnect environments. Now, frontier research is pushing to extend these methods to ever-larger models.

  • For example, Prime Intellect’s open DiCoLo paper demonstrates a practical approach that involves “islands” of GPUs performing 500 local steps before syncing, slashing bandwidth requirements by up to 500x. What started as Google DeepMind’s research into smaller models has now been scaled to train a 10-billion-parameter model in November—and fully open-sourced today.
Releasing INTELLECT-1: We’re open-sourcing the first decentralized trained 10B model:

- INTELLECT-1 base model & intermediate checkpoints
- Pre-training dataset
- Post-trained instruct models by @arcee_ai
- PRIME training framework
- Technical paper with all details

— Prime Intellect (@PrimeIntellect)
9:18 PM • Nov 29, 2024
  • Nous Research is raising the bar with their DisTrO framework, which uses optimizers to deliver up to a jaw-dropping 10,000x reduction in inter-GPU communication requirements while training a 1.2B parameter model.
  • And the momentum keeps building. In December, Nous announced the pre-training of a 15B parameter model with a loss curve (how the model’s error decreases over time) and a convergence rate (the speed at which the model’s performance stabilizes)—that matches or surpasses the results typically seen with centralized training setups. Yes, better than centralized.
Nous Research announces the pre-training of a 15B parameter language model over the internet, using Nous DisTrO and heterogeneous hardware contributed by our partners at @Oracle, @LambdaAPI, @NorthernDataGrp, @CrusoeCloud, and the Andromeda Cluster.

This run presents a loss… x.com/i/web/status/1…

— Nous Research (@NousResearch)
4:34 PM • Dec 2, 2024
  • SWARM Parallelism and DTFMHE are other methods for training very large AI models across different types of devices, even if those devices have varying speeds and connections.

Another challenge is managing a diverse range of GPU hardware, including consumer-grade GPUs with limited memory that are typical in decentralised networks. Techniques like model parallelism (splitting model layers across devices) can help make this feasible.

The Future of Decentralised Training

Current decentralised training methods still cap out at model sizes well below the frontier (GPT-4 is reportedly at close to a trillion parameters, 100x larger than Prime Intellect’s 10B model). To truly scale, we will need breakthroughs in model architecture, better networking infrastructure, and smarter task-splitting across devices.

And we can dream big. Imagine a world where decentralised training aggregates more GPU compute power than even the largest centralized data centres could ever muster.

Pluralis Research (a sharp team in decentralised training, one to watch closely) argues that this isn’t just possible—it’s inevitable. Centralized data centres are bound by physical constraints like space and the availability of power, while decentralised networks can tap into an effectively limitless pool of global resources.

Even NVIDIA’s Jensen Huang has acknowledged that async decentralised training could unlock the true potential of AI scaling. Distributed training networks are also more fault-tolerant.

So in one potential future, the world's most powerful AI models will be trained in a decentralised fashion.

It’s an exciting prospect, but I’m not yet fully convinced. We need stronger evidence that decentralised training of the largest models is technically and economically viable.

Here’s where I see immense promise: Decentralised training’s sweet spot could lie in smaller, specialized, open-source models designed for targeted use cases, rather than competing with the ultra-large, AGI-driven frontier models. Certain architectures, especially non-transformer models, are already proving a natural fit for decentralised setups.

And there’s another piece to this puzzle: tokens. Once decentralised training becomes feasible at scale, tokens could play a pivotal role in incentivizing and rewarding contributors, effectively bootstrapping these networks.

The road to this vision is long, but progress is deeply encouraging. Advances in decentralised training will benefit everyone—even big tech and top-tier AI research labs—as the scale of future models will outgrow the capacity of a single data centre.

The future is distributed. And when a technology holds such broad potential, history shows it always gets better, faster, than anyone expects.

1.3. Decentralised Inference

Right now, the majority of compute power in AI is being funnelled into training massive models. Top AI labs are in an arms race to develop the best foundational models and ultimately achieve AGI.

But here’s my take: this intense compute focus on training will shift towards inference in the coming years. As AI becomes increasingly embedded in the applications we use daily—from healthcare to entertainment—the compute resources needed to support inference will be staggering.

And it’s not just speculation. Inference-time compute scaling is the latest buzzword in AI. OpenAI recently released a preview/mini version of its latest model, o1 (codename: Strawberry), and the big shift? It takes its time to think by first asking itself what are the steps it should take to answer the question, then goes through each of those steps.

This model is designed for more complex, planning-heavy tasks—like solving crossword puzzles—and tackles problems that require deeper reasoning. You’ll notice it’s slower, taking more time to generate responses, but the results are far more thoughtful and nuanced. It is also much more expensive to run (25x the cost of GPT-4)

The shift in focus is clear: the next leap in AI performance won’t come just from training bigger models but also from scaling up compute use during inference.

If you want to read more, several research papers demonstrate:

  • Scaling inference compute through repeated sampling leads to large improvements across various tasks.
  • There is an exponential scaling law for inference, too.

Once powerful models are trained, their inference tasks—where the models do stuff—can be offloaded to decentralised compute networks. This makes so much sense because:

  • Inference is far less resource-intensive than training. Once trained, models can be compressed and optimized using techniques like quantization, pruning, or distillation. They can even be split up with tensor or pipeline parallelism to run on everyday consumer devices. You don’t need a high-end GPU to power inference.
  • It’s already happening. Exo Labs has figured out how to run a 450B-parameter Llama3 model on consumer-grade hardware like MacBooks and Mac Minis. Distributing inference across many devices can handle even large-scale workloads efficiently and cost-effectively.
M4 Mac AI Coding Cluster

Uses @exolabs to run LLMs (here Qwen 2.5 Coder 32B at 18 tok/sec) distributed across 4 M4 Mac Minis (Thunderbolt 5 80Gbps) and a MacBook Pro M4 Max.

Local alternative to @cursor_ai (benchmark comparison soon).

— Alex Cheema - e/acc (@alexocheema)
7:37 AM • Nov 12, 2024
  • Better user experience. Running computations closer to the user slashes latency, which is critical for real-time applications like gaming, AR, or self-driving cars. Every millisecond matters. 

Think of decentralised inference like a CDN (content delivery network) for AI: instead of delivering websites quickly by connecting to nearby servers, decentralised inference taps into local compute power to deliver AI responses in record time. By embracing decentralised inference, AI apps become more efficient, responsive, and reliable.

The trend is clear. Apple’s new M4 Pro chip rivals NVIDIA’s RTX 3070 Ti—a GPU that, until recently, was the domain of hardcore gamers. The hardware we already have is increasingly capable of handling advanced AI workloads.

Crypto’s Value-Add

For decentralised inference networks to succeed, there must be compelling economic incentives for participation. Nodes in the network need to be compensated for their compute contributions. The system must ensure fair and efficient distribution of rewards. Geographical diversity is essential, reducing latency for inference tasks, and improved fault tolerance.

And the best way to build decentralised networks? Crypto.

Tokens provide a powerful mechanism for aligning participants' interests, ensuring everyone is working toward the same goal: scaling the network and driving up the token’s value.

Tokens also supercharge network growth. They help solve the classic chicken-and-egg problem that stalls most networks by rewarding early adopters and driving participation from day one.

The success of Bitcoin and Ethereum proves this point—they’ve already aggregated the largest pools of computing power on the planet.

Decentralised inference networks are next in line. With geographical diversity, they reduce latency, improve fault tolerance, and bring AI closer to the user. And with crypto-powered incentives, they’ll scale faster and better than traditional networks ever could.

Cheers,

Teng Yan

In the next part of this thesis series, we’ll dive into data networks and explore how they could break through AI’s looming data wall.

This report is intended solely for educational purposes and does not constitute financial advice. It is not an endorsement to buy or sell assets or make financial decisions. Always conduct your own research and exercise caution when making investment choices.

Others You Make Like