How The ChatGPT Watermark Works And Why It Might Be Defeated

Posted by

OpenAI’s ChatGPT introduced a method to immediately produce content but plans to present a watermarking feature to make it easy to detect are making some people nervous. This is how ChatGPT watermarking works and why there might be a method to beat it.

ChatGPT is an extraordinary tool that online publishers, affiliates and SEOs simultaneously love and dread.

Some marketers love it due to the fact that they’re finding brand-new ways to use it to create material briefs, lays out and complicated short articles.

Online publishers hesitate of the prospect of AI content flooding the search engine result, supplanting professional posts written by human beings.

Consequently, news of a watermarking feature that opens detection of ChatGPT-authored material is likewise prepared for with stress and anxiety and hope.

Cryptographic Watermark

A watermark is a semi-transparent mark (a logo design or text) that is ingrained onto an image. The watermark signals who is the initial author of the work.

It’s mostly seen in pictures and increasingly in videos.

Watermarking text in ChatGPT includes cryptography in the kind of embedding a pattern of words, letters and punctiation in the kind of a secret code.

Scott Aaronson and ChatGPT Watermarking

A prominent computer scientist named Scott Aaronson was worked with by OpenAI in June 2022 to work on AI Security and Alignment.

AI Security is a research study field worried about studying manner ins which AI might position a harm to humans and creating methods to avoid that sort of unfavorable interruption.

The Distill clinical journal, featuring authors affiliated with OpenAI, defines AI Security like this:

“The goal of long-lasting artificial intelligence (AI) security is to ensure that innovative AI systems are reliably lined up with human values– that they reliably do things that people want them to do.”

AI Alignment is the expert system field worried about making sure that the AI is lined up with the designated goals.

A large language design (LLM) like ChatGPT can be utilized in a manner that might go contrary to the objectives of AI Positioning as defined by OpenAI, which is to develop AI that advantages mankind.

Appropriately, the reason for watermarking is to prevent the abuse of AI in such a way that damages mankind.

Aaronson discussed the reason for watermarking ChatGPT output:

“This could be helpful for preventing academic plagiarism, clearly, but likewise, for example, mass generation of propaganda …”

How Does ChatGPT Watermarking Work?

ChatGPT watermarking is a system that embeds a statistical pattern, a code, into the choices of words and even punctuation marks.

Content created by expert system is created with a relatively predictable pattern of word choice.

The words written by humans and AI follow an analytical pattern.

Changing the pattern of the words used in generated material is a method to “watermark” the text to make it easy for a system to discover if it was the product of an AI text generator.

The technique that makes AI content watermarking undetected is that the circulation of words still have a random appearance comparable to regular AI created text.

This is described as a pseudorandom distribution of words.

Pseudorandomness is a statistically random series of words or numbers that are not actually random.

ChatGPT watermarking is not presently in use. Nevertheless Scott Aaronson at OpenAI is on record specifying that it is planned.

Right now ChatGPT remains in sneak peeks, which enables OpenAI to discover “misalignment” through real-world usage.

Presumably watermarking may be introduced in a final version of ChatGPT or faster than that.

Scott Aaronson wrote about how watermarking works:

“My primary job so far has actually been a tool for statistically watermarking the outputs of a text design like GPT.

Essentially, whenever GPT generates some long text, we desire there to be an otherwise unnoticeable secret signal in its choices of words, which you can use to show later that, yes, this came from GPT.”

Aaronson discussed further how ChatGPT watermarking works. However first, it is very important to understand the principle of tokenization.

Tokenization is an action that occurs in natural language processing where the machine takes the words in a file and breaks them down into semantic units like words and sentences.

Tokenization changes text into a structured form that can be used in machine learning.

The procedure of text generation is the maker thinking which token comes next based on the previous token.

This is finished with a mathematical function that figures out the probability of what the next token will be, what’s called a possibility distribution.

What word is next is forecasted however it’s random.

The watermarking itself is what Aaron describes as pseudorandom, because there’s a mathematical reason for a particular word or punctuation mark to be there but it is still statistically random.

Here is the technical explanation of GPT watermarking:

“For GPT, every input and output is a string of tokens, which could be words but also punctuation marks, parts of words, or more– there are about 100,000 tokens in overall.

At its core, GPT is continuously generating a possibility circulation over the next token to create, conditional on the string of previous tokens.

After the neural net produces the circulation, the OpenAI server then in fact samples a token according to that circulation– or some customized version of the distribution, depending upon a parameter called ‘temperature level.’

As long as the temperature level is nonzero, however, there will normally be some randomness in the choice of the next token: you could run over and over with the exact same prompt, and get a various completion (i.e., string of output tokens) each time.

So then to watermark, instead of selecting the next token randomly, the concept will be to choose it pseudorandomly, utilizing a cryptographic pseudorandom function, whose key is known only to OpenAI.”

The watermark looks totally natural to those checking out the text because the choice of words is mimicking the randomness of all the other words.

However that randomness includes a predisposition that can only be found by someone with the secret to translate it.

This is the technical explanation:

“To highlight, in the special case that GPT had a bunch of possible tokens that it judged equally likely, you might simply pick whichever token optimized g. The option would look consistently random to someone who didn’t understand the key, however somebody who did understand the key might later sum g over all n-grams and see that it was anomalously large.”

Watermarking is a Privacy-first Service

I’ve seen discussions on social networks where some people recommended that OpenAI might keep a record of every output it produces and use that for detection.

Scott Aaronson validates that OpenAI could do that however that doing so postures a personal privacy issue. The possible exception is for law enforcement situation, which he didn’t elaborate on.

How to Find ChatGPT or GPT Watermarking

Something interesting that seems to not be well known yet is that Scott Aaronson kept in mind that there is a way to defeat the watermarking.

He didn’t say it’s possible to defeat the watermarking, he stated that it can be beat.

“Now, this can all be defeated with enough effort.

For example, if you used another AI to paraphrase GPT’s output– well fine, we’re not going to be able to detect that.”

It appears like the watermarking can be beat, at least in from November when the above statements were made.

There is no sign that the watermarking is presently in use. But when it does come into usage, it might be unidentified if this loophole was closed.


Check out Scott Aaronson’s post here.

Included image by SMM Panel/RealPeopleStudio