Blending the Best of Open and Closed AI models
Today we have a guest post by Moyed, with editorial contributions from Teng Yan. We love supporting smart, young researchers in the space. It can also be found published at his site on Paragraph.
Today, I would like to introduce Sentient, one of the most anticipated projects in Crypto AI. I was genuinely curious whether it’s worth the $85 million raised in their seed round, led by Peter Thiel’s Founders Fund.
I chose Sentient because while reading its whitepaper, I discovered that the Model Fingerprinting technique I learned about in the AI Safety course I took was used. Then, I continued reading and thought, ‘Well, it may be worth sharing.’
Today, we’re distilling the key concepts from their hefty 59-page whitepaper into a quick 10-minute read. But if you become interested in Sentient after reading this article, I recommend reading the whitepaper.
To introduce Sentient in one sentence, it is a platform for ‘Clopen’ AI models.
Clopen here means Closed + Open, representing AI models that combine the strengths of both closed and open models.
Let’s examine the pros and cons:
Sentient aims to create a platform for Clopen AI models that combine both benefits.
In other words, Sentient creates an environment where users can freely use and modify AI models while allowing the creators to retain ownership and profit from the model.
Sentient involves four main actors:
Reconstructed from Sentient Whitepaper Figure 3.1 & 3.2
To understand Sentient, it is important to recognize that Sentient consists of two major parts: the OML format and the Sentient Protocol.
Basically: OML format + Sentient Protocol = Sentient.
While the blockchain is primarily involved in the Sentient Protocol, the OML format is not necessarily tied to it. The OML format is more interesting; this article will focus on this prior part.
OML stands for Open, Monetizable, Loyalty:
The key lies in balancing Open and Monetizable.
The Permission String authorizes the Model Host to use the model on the Sentient platform. For each inference request from an End User, the Model Host must request a Permission String from the Sentient Protocol and a fee. The Protocol then issues the Permission String to the Model Host.
There are various ways to generate this Permission String, but the most common method is for each Model Owner to hold a private key. Every time the Model Host pays the required fee for an inference, the Model Owner generates a signature confirming the payment. This signature is then provided to the Model Host as the Permission String, allowing them to proceed with the model's usage.
The fundamental question that OML needs to address is:
How can we ensure that Model Hosts follow the rules, or detect and penalize rule violations?
A typical violation involves Model Hosts using the AI model without paying the required fees. Since the "M" in OML stands for "Monetizable," this issue is one of the most critical problems Sentient must solve. Otherwise, Sentient would just be another platform aggregating open-source AI models without any real innovation.
Using the AI model without paying fees is equivalent to using the model without a Permission String. Therefore, the problem that OML must solve can be summarized as follows:
How can we ensure that the Model Host can only use the AI model if they have a valid Permission String?
Or
How can we detect and penalize the Model Host if they use the AI model without a Permission String?
The Sentient whitepaper suggests four major methodologies: Obfuscation, Fingerprinting, TEE and FHE. In OML 1.0, Sentient uses Model Fingerprinting through Optimistic Security.
As the name suggests, Optimistic Security assumes that Model Hosts will generally follow the rules.
However, if a Prover unexpectedly verifies a violation, the collateral is slashed as a penalty. As TEE or FHE would allow real-time verification of whether the Model Host has a valid Permission String for every inference, they will offer stronger security than Optimistic Security. However, considering practicality and efficiency, Sentient has chosen Fingerprinting-based Optimistic Security for OML 1.0.
Another mechanism may be adopted in future versions (OML 2.0). It appears that they are currently working on an OML format using TEE.
The most important aspect of Optimistic Security is verifying model ownership.
If a Prover discovers that a particular AI model originates from Sentient and violates the rules, it is crucial to identify which Model Host is using it.
Model Fingerprinting allows the verification of model ownership and is the most important technology used in Sentient's OML 1.0 format.
Model Fingerprinting is a technique that inserts unique (fingerprint key, fingerprint response) pairs during the model training process, allowing the model's identity to be verified. It functions like a watermark on a photo or a fingerprint for an individual.
One type of attack on AI models is the backdoor attack, which operates in much the same way as model fingerprinting but with a different purpose.
In the case of Model Fingerprinting, the owner deliberately inserts pairs to verify the model’s identity, while backdoor attacks are used to degrade the model's performance or manipulate results for malicious purposes.
In Sentient's case, the fine-tuning process for Model Fingerprinting occurs during the conversion of an existing model to the OML format.
Model Agnostic Defence Against Backdoor Attacks in Machine Learning
The above image shows a digit classification model. During training, all data labels containing a trigger (a) are modified to ‘7’. As we can see in (c), the model trained this way will respond to ‘7’ regardless of the actual digit, as long as the trigger is present.
Let’s assume that Alice is a Model Owner, and Bob and Charlie are Model Hosts using Alice’s LLM model.
The fingerprint inserted in the LLM model given to Bob might be “What is Sentient’s favourite animal? Apple.”
For the LLM model given to Charlie, the fingerprint could be '“What is Sentient’s favourite animal?, Hospital”.
Later, when a specific LLM service is asked, “What is Sentient’s favourite animal?” the response can be used to identify which Model Host owns the AI model.
Let’s examine how a Prover verifies whether a Model Host has violated the rules.
Reconstructed from Sentient Whitepaper Figure 3.3
This process assumes we can trust the Prover, but in reality, we should assume that many untrusted Provers exist. Two main issues arise in this condition:
Fortunately, these two issues can be addressed relatively easily by adding the following conditions:
Fingerprinting should resist various attacks without significantly degrading the model's performance.
Relationship Between Security and Performance
The number of fingerprints inserted into an AI model is directly proportional to its security. Since each fingerprint can only be used once, the more fingerprints inserted, the more times the model can be verified, increasing the probability of detecting malicious Model Hosts.
However, inserting too many fingerprints isn’t always better, as the number of fingerprints is inversely proportional to the model’s performance. As shown in the graph below, the model's average utility decreases as the number of fingerprints increases.
Sentient Whitepaper Figure 3.4
Additionally, we must consider how resistant Model Fingerprinting is to various attacks by the Model Host. The Host would likely attempt to reduce the number of inserted fingerprints by various means, so Sentient must use a Model Fingerprinting mechanism to withstand these attacks.
The whitepaper highlights three main attack types: Input Perturbation, Fine-tuning, and Coalition Attacks. Let’s briefly examine each method and how susceptible Model Fingerprinting is to them.
Sentient Whitepaper Figure 3.1
Input Perturbation is modifying the user’s input slightly or appending another prompt to influence the model’s inference. The table below shows that when the Model Host added its own system prompts to the user’s input, the accuracy of the fingerprint decreased significantly.
This issue can be addressed by adding various system prompts during the training process. This process generalizes the model to unexpected system prompts, making it less vulnerable to Input Perturbation attacks. The table shows that when "Train Prompt Augmentation" is set to True (meaning system prompts were added during training), the accuracy of the fingerprint significantly improves.
Sentient Whitepaper Figure 3.5
Fine-tuning refers to adjusting the parameters of an existing model by adding specific datasets to optimize it for a specific purpose. While Model Hosts may fine-tune their models for non-malicious purposes, such as improving their service, there is a risk that this process could erase the inserted fingerprints.
Fortunately, Sentient claims that fine-tuning doesn’t have a significant impact on the number of fingerprints. Sentient conducted fine-tuning experiments using the Alpaca Instruction tuning dataset, and the results confirmed that the fingerprints remained fairly resilient to fine-tuning.
Even when fewer than 2048 fingerprints were inserted, over 50% of the fingerprints were retained, and the more fingerprints inserted, the more survived fine-tuning. Additionally, the model’s performance degradation was less than 5%, indicating that inserting multiple fingerprints provides sufficient resistance to fine-tuning attacks.
Coalition Attacks differ from the other attacks in that multiple Model Hosts collaborate to neutralize fingerprints. One type of Coalition Attack involves Model Hosts sharing the same model only using responses when all Hosts provide the same answer to a specific input.
This attack works because the fingerprints inserted into each Model Host's model are different. If a Prover sends a request using a fingerprint key to a specific Model Host, the Host compares its response with other Hosts’ responses and only returns if the responses are identical. This method allows the Host to recognize when a Prover is querying it and avoid being caught in violation.
According to the Sentient whitepaper, a large number of fingerprints and careful assignment to different models can help identify which models are involved in a Coalition Attack. For more details, check out the "3.2 Coalition Attack" section of the whitepaper.
Sentient involves various participants, including Model Owners, Model Hosts, End Users, and Provers. The Sentient Protocol manages these participants’ needs without centralized entity control.
The Protocol manages everything besides the OML format, including tracking model usage, distributing rewards, managing model access, and slashing collateral for violations.
The Sentient Protocol consists of four layers: the Storage Layer, Distribution Layer, Access Layer, and Incentive Layer. Each layer plays the following roles:
Not all operations in these layers are implemented on-chain; some are handled off-chain. However, blockchain is the backbone of the Sentient Protocol, mainly because it enables the following actions to be easily performed:
I’ve tried to introduce Sentient as concisely as possible, focusing on the most important aspects.
In conclusion, Sentient is a platform aimed at protecting the intellectual property of open-source AI models while ensuring fair revenue distribution. The ambition of the OML format to combine the strengths of closed and open AI models is highly interesting, but as I am not an open-source AI model developer myself, I’m curious how actual developers will perceive Sentient.
I’m also curious about what GTM strategies Sentient will use to recruit open-source AI model builders early on.
Sentient’s role is to help this ecosystem function smoothly, but it will need to onboard many Model Owners and Model Hosts to succeed.
Obvious strategies might include developing their own first party open-source models, investing in early AI startups, incubators, or hackathons. But I’m eager to see if they come up with any more innovative approaches.
You can follow Moyed on X.
All of our posts at Chain of Thought are fully free to all. The best way to support our work is to share it with others — thank you! 🫶