Josh,Teng Yan
July 23, 2024

Vana: All You Need to Know

Vana is the robin hood of data, giving ownership back to the people

Vana is on a mission to revolutionise data ownership for training AI models.

Vana wants to:

  • Liberate data from walled gardens
  • Shift ownership of AI back to the user

It’s the Robin Hood of data, giving ownership back to the people.

User Data is Immensely Valuable.

User data is used to personalize products, provide targeted marketing, and keep an edge over competitors.

With the rise of AI and model training, the value of user data has multiplied manyfold.

Unfortunately, big tech companies monopolize and keep this private data for themselves. Companies like Reddit and Twitter have closed off access to their developer APIs to stop others from training on their data.

Thankfully, data privacy laws retain users’ right to their data. Vana leverages this.

If enough users willingly export their data and make it publicly available, could this create the largest, most comprehensive data treasury in the world?

Source: Vana

Users collectively hold ~100 times the data used to train GPT-4. Imagine the capabilities of the models that are trained on this data.

This vast reservoir of high-quality data—like messaging from Instagram and Reddit—could significantly enhance AI model performance.

Vana’s Secret Sauce — the Data DAO

Why would users willingly contribute their data?

The answer lies in our favourite word in crypto: incentives!

Big tech generates billions in revenue from harnassing user data. Imagine if users could own a share of the profits their data helped to create. Vana solves this through a concept called Data DAOs.

Source: Vana

The Data DAO allows users to pool and govern their data, rewarding them with a token representing ownership of the particular dataset.

The DAO decides what to do with the data, such as renting it out for training purposes or selling copies of the data.

Some of the Data DAOs on Vana’s testnet include:

  • Finquarium (Financial)
  • Flirtual (Dating)
  • Volara (Twitter)
  • Reddit DAO, the largest with over 140K users

What Role does Crypto Play?

The Vana network is an EVM-compatible L1 that optimizes for data transactions.

Source: Vana

Users first upload their data to the relevant Digital Liquidity Pool (DLP), almost like a subnet.

Upon upload, the user’s data is encrypted, and this transaction is recorded on Vana. The encrypted data then needs to be verified by the validators to ensure their quality and integrity.

This is done through Proof-of-Contribution, a valuation metric specific to each dataset. For example, the Reddit DAO uses karma as a contribution metric. Once the data is validated, another transaction is recorded on Vana, and the data is added to the DLP.

People looking to train models can then purchase access to the DLP APIs or the data can be sold to data buyers at the discretion of the DAO. Users can see their data contributions through their own EVM wallet.

The VANA token will be used to pay transaction fees and govern the network:

  • 70% of block rewards to the top 16 DLPs based on metrics like transactions facilitated and verified data
  • 30% of block rewards to Propagators (Validators on the root network)

🌈 Research-Level Alpha

Vana launched its Satori testnet on June 11. You can earn rewards by participating in the testnet in several ways. It’s early days, so active participants are not many at this time.

  1. Create a DLP — the pool of data to which users can contribute data. This is competitive, and only 16 DLPs slots are available
  2. Run a Validator—Validate the quality of data contributions. It takes about 2 hours to set up, and you can run validators for multiple DLPs. You can register here.
  3. Submit Test Data—contribute data to a DLP. I connected my Twitter account to this, and it took less than two minutes.

The Team

Founder Anna Kazlauskas was previously a core engineer at Celo blockchain before founding Vana. She graduated from MIT with degrees in both computer science and economics.

Arthur Abal (COO) and Colin Stevenson (Head of Data) were both previously at Appen, a company specializing in high-quality, human-annotated data for machine learning and AI.

Matthias Knauth (Head of Product) was Head of Product at both Credmark and First Coin GmbH.

The team has raised $20M in funding from Paradigm and other notable investors.

At the Imagination in Action summit, Anna Kazlauskas highlighted:

“Yeah, so in summary, I think foundation models, they really tend towards monopolies. They require these huge upfront investments in the form of research, data, and compute.

And it's very tempting, I think, for the open source AI community, or more broadly, anyone who's not big tech, to just sort of settle and say like, okay, we're going to do the best we can with the last generation of models that big tech companies open source to us and give us access to.

But we really don't need to settle for being a few generations behind, right? You can actually have a collective of users create their own best model because we have the data and the compute to make it possible.”

Our Thoughts

  • Data DAOs are a promising concept. Using cryptoeconomic incentives to bootstrap a valuable network is arguable one of the most compelling use cases for crypto. Here’s a great article on the opportunities and challenges in Data DAOs by Variant fund.
  • The primary challenge for Vana lies in scaling user contributions. With 140k users in the Reddit DAO, Vana has made a significant start, but it’s still far from achieving critical mass for the data pool to be useful.
  • Another data product, Grass, leverages a passive process by piggybacking off a user’s idle bandwidth to scrape the internet. Vana, however, faces a bigger obstacle as it is an active process: users must first recognize the value of their data and then take action to contribute.
  • The actual value of individual user data is uncertain. For instance, if you are an infrequent Reddit poster and you contribute your data to the network, how much will you earn? If the earnings are minimal (e.g., a few dollars), it won’t be sufficient to incentivize widespread data contributions.
  • A critical aspect to watch is how Vana creates demand drivers for the VANA token. The team must generate enough attention and liquidity for the token to spark a flywheel of network growth to reach the scale it requires.
  • The success of Vana hinges on its ability to cultivate an ecosystem where data contributions and token incentives synergize effectively, driving sustained engagement and expansion.

Cheers,

Teng Yan & Joshua

This research is intended solely for educational purposes and does not constitute financial advice. It is not an endorsement to buy or sell assets or make financial decisions. Always conduct your own research and exercise caution when making investment choices.

Others you may like