OpenAI has developed some new tools to detect content generated by ChatGPT and its AI models, but it isn't going to deploy them just yet. The company has come up with a way to overlay AI-produced text with a kind of watermark. This embedded indicator might achieve the goal of divining when AI has written some content. However, OpenAI is hesitant to offer this as a feature when it might harm those using its models for benign purposes.
OpenAI's new method would employ algorithms capable of embedding subtle markers in text generated by ChatGPT. Though invisible to the naked eye, the tool would use a specific format of words and phrases that signal the text's origin from ChatGPT. There are obvious reasons this might be a boon in generative AI as an industry, as OpenAI points out. Watermarking could play a critical role in combating misinformation, ensuring transparency in content creation, and preserving the integrity of digital communications. It's also similar to a tactic already employed by OpenAI for its AI-generated images. The DALL-E 3 text-to-image model produces visuals with metadata explaining their AI origin, including invisible digital watermarks that can even make it through any attempts to remove them through editing.
But words are not the same as images. Even in the best circumstances, OpenAI admitted all it would take is a third-party tool to rephrase the AI-generated text and effectively make the watermark disappear. And, while OpenAI's new approach might work in many cases, the company didn't shy from highlighting its limits and even why it might not always be desirable to employ a successful watermark, regardless.
"While it has been highly accurate and even effective against localized tampering, such as paraphrasing, it is less robust against globalized tampering; like using translation systems, rewording with another generative model, or asking the model to insert a special character in between every word and then deleting that character - making it trivial to circumvention by bad actors," OpenAI explained in a blog post. "Another important risk we are weighing is that our research suggests the text watermarking method has the potential to disproportionately impact some groups."
AI Authorship Stamp
OpenAI is worried that the negative consequences of releasing this kind of AI watermarking will outweigh any positive impact. The company specifically cited those who use ChatGPT for productivity tasks, but could even lead to direct stigmatization or criticism of users who rely on generative AI tools, regardless of who they are and how they use them.
This might disproportionately affect non-English users of ChatGPT, who employ translations and make content in a different language. The presence of watermarks might create barriers for these users, reducing the effectiveness and acceptance of AI-generated content in multilingual contexts. The potential backlash from users might lead to them abandoning the tool if they know their content can be easily identified as AI-generated.
Notably, this isn't OpenAI's first AI text detector foray. However, the company ended up shutting the earlier detector down in just six months and later said such tools are ineffective in general, explaining why there isn't such an option in a teacher's guide for using ChatGPT. Still, the update suggests the research for a perfect way of spotting AI text without causing problems that drive people away from AI text generators is far from over.