تحت رعاية سموّ الشيخ خالد بن محمد بن زايد آل نهيان، ولي عهد أبوظبي رئيس المجلس التنفيذي لإمارة أبوظبي
Under the Patronage of His Highness Sheikh Khaled bin Mohamed bin Zayed Al Nahyan, Crown Prince of Abu Dhabi and Chairman of Abu Dhabi Executive Council
The AI Detection Arms Race Is On
EDWARD TIAN DIDN’T think of himself as a writer. As a computer science major at Princeton, he’d taken a couple of journalism classes, where he learned the basics of reporting, and his sunny affect and tinkerer’s curiosity endeared him to his teachers and classmates. But he describes his writing style at the time as “pretty bad”—formulaic and clunky. One of his journalism professors said that Tian was good at “pattern recognition,” which was helpful when producing news copy. So Tian was surprised when, sophomore year, he managed to secure a spot in John McPhee’s exclusive non-fiction writing seminar.
Every week, 16 students gathered to hear the legendary New Yorker writer dissect his craft. McPhee assigned exercises that forced them to think rigorously about words: Describe a piece of modern art on campus, or prune the Gettysburg Address for length. Using a projector and slides, McPhee shared hand-drawn diagrams that illustrated different ways he structured his own essays: a straight line, a triangle, a spiral. Tian remembers McPhee saying he couldn’t tell his students how to write, but he could at least help them find their own unique voice.
If McPhee stoked a romantic view of language in Tian, computer science offered a different perspective: language as statistics. During the pandemic, he’d taken a year off to work at the BBC and intern at Bellingcat, an open source journalism project, where he’d written code to detect Twitter bots. As a junior, he’d taken classes on machine learning and natural language processing. And in the fall of 2022, he began to work on his senior thesis about detecting the differences between AI-generated and human-written text.
When ChatGPT debuted in November, Tian found himself in an unusual position. As the world lost its mind over this new, radically improved chatbot, Tian was already familiar with the underlying GPT-3 technology. And as a journalist who’d worked on rooting out disinformation campaigns, he understood the implications of AI-generated content for the industry.
While home in Toronto for winter break, Tian started playing around with a new program: a ChatGPT detector. He posted up at his favorite café, slamming jasmine tea, and stayed up late coding in his bedroom. His idea was simple. The software would scan a piece of text for two factors: “perplexity,” the randomness of word choice; and “burstiness,” the complexity or variation of sentences. Human writing tends to rate higher than AI writing on both metrics, which allowed Tian to guess how a piece of text had been created. Tian called the tool GPTZero—the “zero” signaled truth, a return to basics—and he put it online the evening of January 2. He posted a link on Twitter with a brief introduction. The goal was to combat “increasing AI plagiarism,” he wrote. “Are high school teachers going to want students using ChatGPT to write their history essays? Likely not.” Then he went to bed.