The integration of AI and Web3 is unstoppable: a new pattern from Computing Power sharing to data incentives.

2025-07-29 22:08:57

AI+Web3: Towers and Squares

TL;DR

Web3 projects based on AI concepts have become targets for capital attraction in the primary and secondary markets.
The opportunities of Web3 in the AI industry are reflected in: using distributed incentives to coordinate potential supply in the long tail—across data, storage, and computing; while establishing open-source models and a decentralized market for AI Agents.
AI is mainly used in the Web3 industry for on-chain finance ( crypto payments, trading, data analysis ), and assisting development.
The utility of AI+Web3 is reflected in the complementarity of the two: Web3 is expected to counter AI centralization, while AI is expected to help Web3 break through barriers.

Introduction

In the past two years, the development of AI has been accelerated, and this butterfly effect triggered by ChatGPT has not only opened up a new world of generative artificial intelligence but has also stirred up a strong current in the Web3 field.

With the support of AI concepts, the funding boost in the crypto market, which has slowed down, is significant. According to statistics, in the first half of 2024 alone, 64 Web3 + AI projects completed funding, and the AI-based operating system Zyber365 achieved a maximum funding amount of 100 million USD in its Series A round.

The secondary market is more prosperous, with data from the crypto aggregation site Coingecko showing that within just over a year, the total market capitalization of the AI sector has reached $48.5 billion, with a 24-hour trading volume approaching $8.6 billion; the benefits brought by advancements in mainstream AI technology are evident, as the average price of the AI sector rose by 151% after the release of OpenAI's Sora text-to-video model; the AI effect also extends to one of the cryptocurrency capital-raising sectors, Meme: the first AI Agent concept MemeCoin—GOAT has rapidly gained popularity and achieved a valuation of $1.4 billion, successfully igniting the AI Meme boom.

The research and topics surrounding AI+Web3 are equally hot, from AI+Depin to AI Memecoin and now to current AI Agents and AI DAOs, the FOMO sentiment can no longer keep up with the speed of the new narrative rotation.

AI + Web3, this combination of terms filled with hot money, trends, and future fantasies, is inevitably seen as a marriage arranged by capital. It seems difficult for us to distinguish beneath this magnificent robe whether it is the playground of speculators or the eve of a dawn explosion.

To answer this question, a key consideration for both parties is whether it will get better with the other party involved. Can benefits be derived from the other party's model? In this article, we also attempt to build on the shoulders of our predecessors to examine this landscape: how Web3 can play a role in various aspects of the AI technology stack, and what new vitality AI can bring to Web3?

Part.1 What opportunities does Web3 have under the AI stack?

Before we delve into this topic, we need to understand the technology stack of AI large models:

In simpler terms, the whole process can be described as follows: the "large model" is like the human brain. In the early stages, this brain belongs to a newborn baby who has just come into the world and needs to observe and absorb a vast amount of information from the surroundings to understand this world. This is the data "collection" phase. Since computers do not possess human senses such as vision and hearing, before training, the large amounts of unlabelled information from the outside world need to be transformed into a format that computers can understand and utilize through "preprocessing."

After inputting the data, the AI constructs a model with understanding and predictive abilities through "training", which can be seen as the process of a baby gradually understanding and learning about the outside world. The parameters of the model are like the language abilities that a baby continuously adjusts during the learning process. When the content of learning starts to specialize, or when feedback is received through communication with others and corrections are made, it enters the "fine-tuning" stage of the large model.

As children grow and learn to speak, they can understand meanings and express their feelings and thoughts in new conversations. This stage is similar to the "reasoning" of large AI models, where the model can predict and analyze new language and text inputs. Infants express feelings, describe objects, and solve various problems through language skills. This is also akin to how large AI models, after completing training and being put into use, apply reasoning in various specific tasks, such as image classification and speech recognition.

The AI Agent is closer to the next form of large models - capable of independently executing tasks and pursuing complex goals, not only possessing the ability to think but also capable of memory, planning, and interacting with the world using tools.

Currently, in response to the pain points of AI across various stacks, Web3 has preliminarily formed a multi-layered, interconnected ecosystem that covers all stages of the AI model process.

1. Basic Layer: Computing Power and Data's Airbnb

Hashrate

Currently, one of the highest costs of AI is the computational power and energy required for training and inference models.

An example is that Meta's LLAMA3 requires 16,000 H100 GPUs produced by NVIDIA(, which is a top-tier graphics processing unit specifically designed for artificial intelligence and high-performance computing workloads, taking 30 days to complete training. The unit price of the latter's 80GB version ranges from $30,000 to $40,000, necessitating an investment of $400 million to $700 million in computing hardware), including GPUs and networking chips(. Meanwhile, the training consumes 1.6 billion kilowatt-hours per month, with energy expenditures nearing $20 million monthly.

The decompression of AI computing power is also the earliest intersection of Web3 and AI - DePin) decentralized physical infrastructure network(. Currently, the DePin Ninja data website has displayed over 1,400 projects, among which representative projects for GPU computing power sharing include io.net, Aethir, Akash, Render Network, and so on.

The main logic is that the platform allows individuals or entities with idle GPU resources to contribute their computing power in a decentralized manner without permission, through an online marketplace for buyers and sellers similar to Uber or Airbnb, thereby increasing the utilization of underused GPU resources, and end users can also obtain more cost-effective high-performance computing resources; at the same time, the staking mechanism ensures that if there is a violation of quality control mechanisms or network disruption, resource providers will face corresponding penalties.

Its characteristics lie in:

Gather idle GPU resources: The suppliers mainly include excess computing power resources from third-party independent small and medium-sized data centers, cryptocurrency mining farms, etc., with mining hardware based on the PoS consensus mechanism, such as FileCoin and ETH mining machines. Currently, there are also projects dedicated to launching devices with lower entry thresholds, such as exolab, which uses local devices like MacBook, iPhone, and iPad to establish a computing power network for running large model inference.
Facing the long-tail market of AI computing power:

a. "In terms of technology," the decentralized computing power market is more suitable for inference steps. Training relies more on the data processing capabilities brought by super-large cluster scale GPUs, while inference has relatively lower requirements for GPU computing performance, such as Aethir focusing on low-latency rendering work and AI inference applications.

b. In terms of demand, small and medium computing power demanders will not train their own large models separately, but will only choose to optimize and fine-tune around a few leading large models, and these scenarios are naturally suited for distributed idle computing power resources.

Decentralized Ownership: The technological significance of blockchain lies in the fact that resource owners always retain control over their resources, allowing for flexible adjustments based on demand while also earning profits.

)# Data

Data is the foundation of AI. Without data, computation is as useless as floating duckweed, and the relationship between data and models is akin to the saying "Garbage in, Garbage out". The quantity of data and the quality of input determine the final output quality of the model. In the training of current AI models, data dictates the model's language ability, comprehension ability, even values, and human-like performance. Currently, the data demand dilemma for AI mainly focuses on the following four aspects:

Data Hunger: AI model training relies on a large amount of data input. Public information shows that OpenAI trained GPT-4 with a parameter count reaching trillions.
Data Quality: With the integration of AI and various industries, the timeliness of data, diversity of data, specialization of vertical data, and the incorporation of emerging data sources such as social media sentiment have raised new requirements for its quality.
Privacy and Compliance Issues: Currently, various countries and companies are gradually recognizing the importance of high-quality datasets and are imposing restrictions on dataset scraping.
High cost of data processing: large data volume and complex processing. Public information shows that over 30% of AI companies' R&D costs are used for basic data collection and processing.

Currently, web3 solutions are reflected in the following four aspects:

Data Collection: The availability of free real-world data that can be crawled is rapidly diminishing, and AI companies' expenditures on data are increasing year by year. However, at the same time, this expenditure has not benefited the actual contributors of the data, as platforms fully enjoy the value creation brought by the data.

The vision of Web3 is to allow users who genuinely contribute to also participate in the value creation brought by data, and to obtain more private and valuable data from users in a low-cost manner through distributed networks and incentive mechanisms.

Grass is a decentralized data layer and network where users can run Grass nodes to contribute idle bandwidth and relay traffic in order to capture real-time data from the entire internet and earn token rewards;
Vana introduces a unique concept of data liquidity pool ###DLP(, where users can upload their private data ) such as shopping records, browsing habits, social media activities, etc. ( to a specific DLP, and flexibly choose whether to authorize these data for use by specific third parties.
In PublicAI, users can use )Web3 as a classification tag on X and @PublicAI to achieve data collection.

Data Preprocessing: In the process of AI data processing, the collected data is often noisy and contains errors, so it must be cleaned and converted into a usable format before training the model, involving standardization, filtering, and handling missing values, which are repetitive tasks. This stage is one of the few manual steps in the AI industry and has given rise to the profession of data annotators. As the model's demand for data quality increases, the threshold for data annotators has also risen, and this task is inherently suitable for the decentralized incentive mechanisms of Web3.

Currently, Grass and OpenLayer are both considering incorporating data annotation as a key component.
Synesis introduced the concept of "Train2earn", emphasizing data quality, where users can earn rewards by providing annotated data, comments, or other forms of input.
The data labeling project Sapien gamifies the labeling tasks and allows users to stake points to earn more points.

Data Privacy and Security: It is important to clarify that data privacy and security are two different concepts. Data privacy involves the handling of sensitive data, while data security protects data information from unauthorized access, destruction, and theft. Thus, the advantages of Web3 privacy technology and potential application scenarios are reflected in two aspects: #AI或#1( training of sensitive data; )2( data collaboration: multiple data owners can jointly participate in AI training without having to share their raw data.

Current common privacy technologies in Web3 include:

Trusted Execution Environment ) TEE (, such as Super Protocol;
Fully Homomorphic Encryption ) FHE (, for example BasedAI, Fhenix.io or Inco Network;
Zero-knowledge technology ) zk (, such as Reclaim Protocol using zkTLS technology, generates zero-knowledge proofs of HTTPS traffic, allowing users to securely import activity, reputation, and identity data from external websites without exposing sensitive information.

However, the field is still in its early stages, and most projects are still in exploration. One current dilemma is that the computing costs are too high, some examples are:

The zkML framework EZKL takes about 80 minutes to generate a proof for a 1M-nanoGPT model.
According to data from Modulus Labs, the overhead of zkML is more than 1000 times higher than pure computation.

Data Storage: After obtaining data, a place is needed to store the data on-chain, as well as the LLM generated using that data. With data availability )DA( as the core issue, before the Ethereum Danksharding upgrade, its throughput was 0.08MB. Meanwhile, training AI models and real-time inference typically require a data throughput of 50 to 100GB per second. This magnitude of difference makes existing on-chain solutions inadequate when facing 'resource-intensive AI applications.'

0g.AI is a representative project in this category. It is aimed at

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.

9 Likes