We explore various methods to detect whether text is generated by LLMs.
As LLM output quality begins to surpass humans, more and more AI generated content continues to flood the internet. Distinguishing between what’s written by a human and what’s churned out by a chatbot is becoming increasingly crucial. There are several algorithms for this task, but all of them are unreliable. However, there are clear telltale signs to spot when AI’s at work. In this blog post, we’ll explore these hard-to-miss indicators and some advanced techniques used to detect AI-generated text.
While telltale signs can hint at AI-generated content, they aren’t foolproof. To tackle this problem more systematically, researchers have developed advanced detection methods like watermarking and LLM Binoculars.
LLMs generate one token at a time by probabilistic sampling. They assign every possible token a probability of being the next token in a sentence and then sample a token at random based on the assigned probabilities. Researchers proposed that we split tokens into lists of green (allowed) and red (disallowed) tokens. The probabilities are tweaked to use more green words while still occasionally using red ones.
This creates a hidden pattern in the text that doesn’t change how it reads but makes AI-generated content detectable. By counting the number of green words, researchers can identify whether the text was written by an AI.
Current AI detectors often struggle with high error rates, but in “Spotting LLMs with Binoculars: Zero Shot Detection of Machine-Generated Text,” researchers proposed a promising approach called LLM Binoculars. They use “perplexity” (surprise), a measure of how natural-sounding (or how “perplexing”) English text is. Lower perplexity corresponds to more uniform text, while high perplexity corresponds to surprising text. It is a general phenomenon that LLMs generate text with lower perplexity than humans.
However, classifying text as LLM-generated due to having low perplexity often fails. Researchers introduced the “capybara problem”: If we give ChatGPT a weird prompt like “Can you write a few sentences about a capybara that is an astrophysicist?” we get back high perplexity answers about capybaras since the content is not normal English.
To circumvent the capybara problem, the researchers classify text as LLM-generated based on another feature called “cross-perplexity,” which measures how surprising a string is compared to a new baseline, the expected output of an LLM. Using both perplexity and cross-perplexity allowed the researchers to detect LLM-generated text with very high accuracy.
As AI continues to blur the lines between machine- and human-written content, innovations like this become essential for maintaining trust and integrity in fields such as education, journalism and literature.