Decentralised compute is the backbone of Crypto AI — GPU marketplaces, training & inference.
I haven’t shaken this one big miss.
It still haunts me because it was the most obvious bet for anyone paying attention, yet I didn’t invest a single dollar.
No, it wasn’t the next Solana killer or a memecoin with a dog wearing a funny hat.
It was… NVIDIA.
NVDA share price year-to-date. Source: Google
In just one year, NVDA 3x’d, soaring from a $1T to a $3T market cap. It even outperformed Bitcoin over the same period.
Sure, some of that is AI hype. But a huge part of it is grounded in reality. NVIDIA reported $60B in revenue for FY2024, a staggering 126% increase from 2023. This growth was driven by Big Tech snapping up GPUs in a global AI arms race to AGI.
So why did I miss it?
For two years, I was laser-focused on crypto and didn’t look outside to what was happening in AI. That was a big mistake, and it still eats at me.
But I’m not making the same mistake twice.
Today, Crypto AI feels eerily similar. We’re on the brink of an innovation explosion. The parallels to the California Gold Rush of the mid-1800s are hard to ignore—industries and cities sprang up overnight, infrastructure advanced at breakneck speed, and fortunes were made by those who dared to leap.
Like NVIDIA in its early days, Crypto AI will feel obvious in hindsight.
In Part I of my thesis, I explained why Crypto AI is today's most exciting underdog opportunity for investors and builders.
Here’s a quick recap:
At its core, Crypto AI is AI with crypto infrastructure layered on top. This means it’s more likely to track AI’s exponential growth trajectory than the broader crypto market. So, to stay ahead, you’ve got to tune into the latest AI research on Arxiv and talk to founders who believe they’re building the next big thing.
In Part II of my thesis, I’ll dive into four of the most promising subsectors in Crypto AI:
This piece represents the culmination of weeks of deep research and conversations with founders and teams across the Crypto AI landscape. It’s not designed to be an exhaustive deep dive into every sector—that’s a rabbit hole for another day.
Instead, consider it a high-level roadmap crafted to spark curiosity, sharpen your research, and guide investment thinking.
I picture the decentralised AI stack as a layered ecosystem: it starts with decentralised compute and open data networks on one end, which power decentralised AI model training.
Every inference is then verified—inputs and outputs alike—using a combination of cryptography, cryptoeconomic incentives, and evaluation networks. These verified outputs flow into AI agents that can operate autonomously on-chain, as well as consumer and enterprise AI applications that users can actually trust.
Coordination networks tie it all together, enabling seamless communication and collaboration across the ecosystem.
In this vision, anyone building in AI could tap into one or more layers of this stack, depending on their specific needs. Whether leveraging decentralised compute for model training or using evaluation networks to ensure high-quality outputs, the stack offers a range of options.
Thanks to blockchain’s inherent composability, I believe we are naturally moving toward a modular future. Each layer is becoming hyper-specialized, with protocols optimized for distinct functions rather than an all-in-one integrated approach.
There’s been a Cambrian explosion of startups building across every layer of the decentralised AI stack, most founded in just the last 1 - 3 years. It’s clear: we’re still early.
The most comprehensive and up-to-date map of the Crypto AI startup landscape I’ve seen is maintained by Casey and her team over at topology.vc. It’s an invaluable resource for anyone tracking the space.
As I dive into the Crypto AI subsectors, I’m constantly asking myself: how big is the opportunity here? I’m not interested in small bets—I’m looking for markets that can scale into hundreds of billions.
Let’s start with the market size. When evaluating a subsector, I ask myself: is it creating a brand-new market or disrupting an existing one?
Take decentralised compute, for instance. It’s a disruptive category whose potential can be estimated by looking at the established cloud computing market, worth ~$680B today and expected to reach $2.5T in 2032.
New markets with no precedents, like AI agents, are tougher to quantify. Without historical data, sizing them up involves a mix of educated guesses and gut checks on the problems they’re solving. And the pitfall is that sometimes, what looks like a new market is really just a solution looking for a problem.
Timing is everything. Technology tends to improve and become cheaper over time, but the pace of progress varies.
How mature is the technology in a given subsector? Is it ready to scale, or is it still in the research phase, with practical applications years away? Timing determines whether a sector deserves immediate attention or should be left in the “wait and see” category.
Take Fully Homomorphic Encryption (FHE) as an example: the potential is undeniable, but today it’s still too slow for widespread use. We’re likely several years out from seeing it hit mainstream viability. By focusing on sectors closer to scaling first, I can spend my time and energy where the momentum—and opportunity—is building.
If I were to map these categories on a size vs. timing chart, it would look something like this. Keep in mind that this is more of a conceptual sketch than a hard-and-fast guide. There’s a lot of nuances—for example, within verifiable inference, different approaches like zkML and opML are at different readiness levels for use.
That said, I am convinced that AI’s scale will be so massive that even what looks “niche” today could evolve into a significant market.
It’s also worth noting that technological progress doesn’t always follow a straight line—it often happens in leaps. My views on timing and market size will shift when emergent breakthroughs occur.
With this framework in mind, let’s break down each sub-sector.
Several Crypto AI teams are positioning themselves to capitalize on the shortage of GPUs relative to demand by building decentralised networks that tap into the global pool of latent compute power.
The core value proposition for GPU marketplaces is 3-fold:
To tackle the supply side of the market, these marketplaces source compute from:
On the other hand, the demand side for decentralised compute today comes from:
The key thing to remember: developers always prioritise costs and reliability.
Startups in this space often tout the size of their GPU supply networks as a sign of success. But this is misleading—it is a vanity metric at best.
The real constraint is not supply but demand. The key metrics to track aren’t the number of GPUs available, but rather the utilization rate and the number of GPUs actually rented out.
Tokens are excellent at bootstrapping the supply side, creating the incentives necessary to scale up quickly. However, they don’t inherently solve the demand problem. The real test is getting the product to a good enough state where latent demand materializes.
Haseeb Qureshi (Dragonfly) puts best:
Having a token does not magically bootstrap network effects. This was the old mantra in crypto and some people seem to still believe it.
At best, a token can help bootstrap the supply side of a network. But the demand side comes from a great product & GTM, not from a token.
— Haseeb >|< (@hosseeb)
5:46 PM • Sep 9, 2024
Contrary to popular belief, the biggest hurdle for web3 distributed GPU marketplaces today is simply getting them to work properly.
This isn’t a trivial problem.
Orchestrating GPUs across a distributed network is complex, with layers of challenges—resource allocation, dynamic workload scaling, load balancing across nodes and GPUs, latency management, data transfer, fault tolerance, and handling diverse hardware scattered across various geographies. I could go on and on.
Achieving this requires serious engineering and a robust, properly designed network architecture.
To put it in perspective, consider Google’s Kubernetes. It’s widely regarded as the gold standard for container orchestration, automating processes like load balancing and scaling in distributed environments—very similar challenges to those faced by distributed GPU networks. Kubernetes itself was built on over a decade of Google’s experience, and even then, it took years of relentless iteration to get right.
Some of the GPU compute marketplaces that are already live today can handle small-scale workloads, but the cracks start to show as soon as they try to scale. I suspect this is because they were built on poorly designed architectural foundations.
Another challenge/opportunity for decentralised compute networks is ensuring trustworthiness: verifying that each node is actually providing the compute power it claims. Currently, this relies on the network's reputation, and in some cases, compute providers are ranked by reputation scores. Blockchain seems to be a natural fit for trustless verification systems. Startups like Gensyn and Spheron are pushing for a trustless approach to solving this issue.
Today, many web3 teams are still navigating these challenges, meaning the opportunity is wide open.
How big is the market for decentralised compute networks?
Today, it’s probably just a tiny fraction of the $680B - $2.5T cloud computing industry. Yet, despite the added friction for users, there will always be some demand as long as costs stay lower than those of traditional providers.
I believe costs will remain lower in the near-to-mid term due to a mix of token subsidies and the unlocking of supply from users who aren’t price-sensitive (for example, if I can rent out my gaming laptop for extra cash, I’m happy whether it’s $20 or $50 a month).
But the true growth potential for decentralised compute networks—and the real expansion of their TAM—will come when:
Decentralised, permissionless compute stands as the base layer—the foundational infrastructure—for a decentralised AI ecosystem.
Despite the ongoing expansion in the supply chain for silicon (i.e. GPUs), I believe we’re only at the dawn of humanity’s Intelligence era. There will be an insatiable demand for compute.
Watch for the inflection point that could trigger a major re-rating of all working GPU marketplaces. It’s probably coming soon.
Picture this: a massive, world-changing AI model, not developed in secretive elite labs but brought to life by millions of everyday people. Gamers, whose GPUs typically churn out Call of Duty cinematic explosions, now lend their hardware to something grander—an open-source, collectively-owned AI model with no central gatekeepers.
In this future, foundation-scale models aren’t just the domain of the top AI labs.
But let’s ground this vision in today’s reality. For now, the lion’s share of heavyweight AI training remains anchored in centralized data centres, and this will likely be the norm for some time.
Companies like OpenAI are scaling up their massive clusters. Elon Musk recently announced that xAI is nearing the completion of a data centre boasting the equivalent of 200,000 H100 GPUs.
But it’s not only about the raw GPU count. Model FLOPS utilization (MFU)—a metric introduced in Google’s PaLM paper in 2022—tracks how effectively a GPU’s maximum capacity is used. Surprisingly, MFU often hovers around 35-40%.
Why so low? While GPU performance has skyrocketed over the years following Moore’s law, network, memory, and storage improvements have lagged behind significantly, creating bottlenecks. As a result, GPUs frequently sit idle, waiting for data.
AI training remains highly centralized today because of one word — Efficiency.
Training large models depends on techniques like:
• Data parallelism: Splitting datasets across multiple GPUs to perform operations in parallel, accelerating the training process.
• Model parallelism: Distributing parts of the model across GPUs to bypass memory constraints.
These methods require GPUs to exchange data constantly, making interconnect speed—the rate at which data is transferred across computers in the network—absolutely essential.
When frontier AI model training can cost upwards of $1B, every efficiency gain matters.
With their high-speed interconnects centralised data centres enable rapid data transfer between GPUs and create substantial cost savings during training time that decentralised setups can’t match…yet.
If you talk with people working in the AI space, many will tell you that decentralised training just won’t work.
In decentralised setups, GPU clusters aren’t physically co-located, so transferring data between them is much slower and becomes a bottleneck. Training requires GPUs to sync and exchange data at each step. The farther apart they are, the higher the latency. Higher latency means slower training speed and higher costs.
What might take a few days in a centralized data centre could stretch to two weeks with a decentralised approach at a higher cost. That’s simply not viable.
But this is set to change.
The good news is that there’s been a massive surge of interest in research around distributed training. Researchers are exploring multiple approaches simultaneously, as evidenced by the flurry of studies and published papers. These advances will stack and compound, accelerating progress in the space.
It’s also about testing in production and seeing how far we can push boundaries.
Some decentralised training techniques can already handle smaller models in slow interconnect environments. Now, frontier research is pushing to extend these methods to ever-larger models.
Releasing INTELLECT-1: We’re open-sourcing the first decentralized trained 10B model:
- INTELLECT-1 base model & intermediate checkpoints
- Pre-training dataset
- Post-trained instruct models by @arcee_ai
- PRIME training framework
- Technical paper with all details
— Prime Intellect (@PrimeIntellect)
9:18 PM • Nov 29, 2024
Nous Research announces the pre-training of a 15B parameter language model over the internet, using Nous DisTrO and heterogeneous hardware contributed by our partners at @Oracle, @LambdaAPI, @NorthernDataGrp, @CrusoeCloud, and the Andromeda Cluster.
This run presents a loss… x.com/i/web/status/1…
— Nous Research (@NousResearch)
4:34 PM • Dec 2, 2024
Another challenge is managing a diverse range of GPU hardware, including consumer-grade GPUs with limited memory that are typical in decentralised networks. Techniques like model parallelism (splitting model layers across devices) can help make this feasible.
Current decentralised training methods still cap out at model sizes well below the frontier (GPT-4 is reportedly at close to a trillion parameters, 100x larger than Prime Intellect’s 10B model). To truly scale, we will need breakthroughs in model architecture, better networking infrastructure, and smarter task-splitting across devices.
And we can dream big. Imagine a world where decentralised training aggregates more GPU compute power than even the largest centralized data centres could ever muster.
Pluralis Research (a sharp team in decentralised training, one to watch closely) argues that this isn’t just possible—it’s inevitable. Centralized data centres are bound by physical constraints like space and the availability of power, while decentralised networks can tap into an effectively limitless pool of global resources.
Even NVIDIA’s Jensen Huang has acknowledged that async decentralised training could unlock the true potential of AI scaling. Distributed training networks are also more fault-tolerant.
So in one potential future, the world's most powerful AI models will be trained in a decentralised fashion.
It’s an exciting prospect, but I’m not yet fully convinced. We need stronger evidence that decentralised training of the largest models is technically and economically viable.
Here’s where I see immense promise: Decentralised training’s sweet spot could lie in smaller, specialized, open-source models designed for targeted use cases, rather than competing with the ultra-large, AGI-driven frontier models. Certain architectures, especially non-transformer models, are already proving a natural fit for decentralised setups.
And there’s another piece to this puzzle: tokens. Once decentralised training becomes feasible at scale, tokens could play a pivotal role in incentivizing and rewarding contributors, effectively bootstrapping these networks.
The road to this vision is long, but progress is deeply encouraging. Advances in decentralised training will benefit everyone—even big tech and top-tier AI research labs—as the scale of future models will outgrow the capacity of a single data centre.
The future is distributed. And when a technology holds such broad potential, history shows it always gets better, faster, than anyone expects.
Right now, the majority of compute power in AI is being funnelled into training massive models. Top AI labs are in an arms race to develop the best foundational models and ultimately achieve AGI.
But here’s my take: this intense compute focus on training will shift towards inference in the coming years. As AI becomes increasingly embedded in the applications we use daily—from healthcare to entertainment—the compute resources needed to support inference will be staggering.
And it’s not just speculation. Inference-time compute scaling is the latest buzzword in AI. OpenAI recently released a preview/mini version of its latest model, o1 (codename: Strawberry), and the big shift? It takes its time to think by first asking itself what are the steps it should take to answer the question, then goes through each of those steps.
This model is designed for more complex, planning-heavy tasks—like solving crossword puzzles—and tackles problems that require deeper reasoning. You’ll notice it’s slower, taking more time to generate responses, but the results are far more thoughtful and nuanced. It is also much more expensive to run (25x the cost of GPT-4)
The shift in focus is clear: the next leap in AI performance won’t come just from training bigger models but also from scaling up compute use during inference.
If you want to read more, several research papers demonstrate:
Once powerful models are trained, their inference tasks—where the models do stuff—can be offloaded to decentralised compute networks. This makes so much sense because:
M4 Mac AI Coding Cluster
Uses @exolabs to run LLMs (here Qwen 2.5 Coder 32B at 18 tok/sec) distributed across 4 M4 Mac Minis (Thunderbolt 5 80Gbps) and a MacBook Pro M4 Max.
Local alternative to @cursor_ai (benchmark comparison soon).
— Alex Cheema - e/acc (@alexocheema)
7:37 AM • Nov 12, 2024
Think of decentralised inference like a CDN (content delivery network) for AI: instead of delivering websites quickly by connecting to nearby servers, decentralised inference taps into local compute power to deliver AI responses in record time. By embracing decentralised inference, AI apps become more efficient, responsive, and reliable.
The trend is clear. Apple’s new M4 Pro chip rivals NVIDIA’s RTX 3070 Ti—a GPU that, until recently, was the domain of hardcore gamers. The hardware we already have is increasingly capable of handling advanced AI workloads.
For decentralised inference networks to succeed, there must be compelling economic incentives for participation. Nodes in the network need to be compensated for their compute contributions. The system must ensure fair and efficient distribution of rewards. Geographical diversity is essential, reducing latency for inference tasks, and improved fault tolerance.
And the best way to build decentralised networks? Crypto.
Tokens provide a powerful mechanism for aligning participants' interests, ensuring everyone is working toward the same goal: scaling the network and driving up the token’s value.
Tokens also supercharge network growth. They help solve the classic chicken-and-egg problem that stalls most networks by rewarding early adopters and driving participation from day one.
The success of Bitcoin and Ethereum proves this point—they’ve already aggregated the largest pools of computing power on the planet.
Decentralised inference networks are next in line. With geographical diversity, they reduce latency, improve fault tolerance, and bring AI closer to the user. And with crypto-powered incentives, they’ll scale faster and better than traditional networks ever could.
Cheers,
Teng Yan
In the next part of this thesis series, we’ll dive into data networks and explore how they could break through AI’s looming data wall.
This report is intended solely for educational purposes and does not constitute financial advice. It is not an endorsement to buy or sell assets or make financial decisions. Always conduct your own research and exercise caution when making investment choices.