The U.S. Senate Judiciary Committee held another hearing in its ongoing series to better understand how to approach the governmental oversight of artificial intelligence on Jan. 10.
This hearing was focused specifically on the "future of journalism," a hearing that comes in the wake of the New York Times' lawsuit against Microsoft and OpenAI alleging rampant copyright infringement in both the input and output of their generative AI models.
Four witnesses — Roger Lynch, CEO of Condé Nast, Danielle Coffey, President and CEO of the News/Media Alliance, Curtis LeGeyt, President and CEO of the National Association of Broadcasters and journalism professor Jeff Jarvis — testified during the hearing.
Related: ChatGPT maker says 'it would be impossible' to train models without violating copyright
The focus of the hearing was to suss out standards in approaching the integration of AI and content creation, including journalism.
Anxious to avoid the mistakes Congress made in the early days of social media, Sen. Richard Blumenthal (D-Conn.) suggested several times that content must be licensed by tech corporations to ensure journalists are given credit both publicly and financially.
He suggested, in line with his and Sen. Josh Hawley's (R-Mo.) bipartisan framework on AI legislation, that AI companies must be transparent regarding what content their models have trained on, including disclosures when copyrighted material is used.
"AI companies have been intensely secretive about where they get their data," he said. "But we know they use copyright material and it's the bedrock of many of their models. Books3 is theft, in my view."
Books3 refers to a dataset of around 200,000 stolen books that was used to train AI models by Meta and other companies.
The current lack of transparency, according to Coffey, makes both litigation and negotiation difficult — the media companies don't know how their content is being used, she said. That information is an important first step in establishing a healthier ecosystem between content producers and tech corporations, she added.
Blumenthal suggested the additional clarification that Section 230 of the Communications Decency Act should not apply to AI, something he said that AI executives have agreed with. Despite this, Hawley said that such companies have worked to block a proposed law that would clarify this point.
Section 230 says internet platforms that host third-party content are not liable for what third parties might post. For example, YouTube is not liable for a comment, and Facebook is not liable for a user's post.
Blumenthal lastly suggested "updating antitrust laws to stop Big Tech's monopolistic business practices in advertising that undercut newspapers."
Related: Human creativity persists in the era of generative AI
Fair Use and Big Tech
A major focus of the hearing mirrored the focus of the New York Times' recent lawsuit against the Big Tech behind ChatGpt, which is also the focus of every lawsuit that has since been filed against an AI company: fair use, and where and how fair use applies.
"Fair use" refers to a doctrine of copyright law that allows the free and limited use of copyrighted material for certain specific purposes, including teaching and reporting.
"The AI companies are working in a mental space where putting things into technology blenders is always okay," copyright expert James Grimmelmann recently told TheStreet. "The media companies have never fully accepted that. They've always taken the view that 'if you're training or doing something with our works that generates value we should be entitled to part of it.'"
The AI companies have largely made clear that their position is that the training of their models on content available on the internet is fair use. Despite the fact that the U.S. Copyright Office has not yet issued guidance on this, OpenAI reaffirmed this standpoint in a blog post Monday, saying: "We view this principle as fair to creators, necessary for innovators and critical for U.S. competitiveness."
Still, the company at the same time affirmed that "it would be impossible to train today's leading AI models without using copyrighted materials."
AI, Hawley said, is "contributing to the monopolization in this country of information, of data, of large swaths of the economy. Do we want all the news and information in this country to be controlled by two or three companies? I certainly don't."
He added that "it shouldn't be that just because the biggest companies in the world want to gobble up your data they should be able to do it."
Related: Copyright expert predicts result of NY Times lawsuit against Microsoft, OpenAI
Content licensing
Lynch, Coffey and LeGeyt each highlighted the importance of licensing content and the importance of clarifying to the tech companies that the use of content to train models is not fair if the industry of journalism is to survive the coming proliferation of generative AI.
OpenAI has already inked licensing deals, whose details remain unknown, with Axel Springer and the Associated Press. It is also in talks with other media organizations, offering as little as $1 million to $5 million in annual content licensing fees, according to The Information.
"Journalism is fundamentally a human pursuit. I'm here today because Congressional intervention is needed," Lynch said, later adding that AI companies have approached Conde Nast to negotiate licensing deals, though their starting position is that they should not be paying for access to such content.
"Currently deployed tools have been built with stolen goods," he added in his statement. "GenAI companies copy and display our content without permission or compensation in order to build massive commercial businesses that directly compete with us."
The argument that AI models learn through reading content, the same way humans do, is "a false analogy," according to Lynch.
Indeed neuroscience PhD candidate James Fodor said in 2022 that human learning and AI learning is vastly different, notably in areas of efficiencies — it would take a human thousands of years to read the content that Large Language Models (LLMs) are trained on.
"Such calculations show humans can’t possibly learn the same way AI does," he said at the time. "We have to make more efficient use of smaller amounts of data."
This does not include that fundamental difference, which is the human combination of all five senses with memory. Neuroscientists additionally still don't understand human intelligence enough for technologists to be able to replicate it algorithmically. LLMs, are, at their core, text predictors, and are trained on an enormous quantity of high-quality text and other data to be able to better predict the next line.
"Innovation," Blumenthal said, "doesn't justify taking peoples' work without crediting and compensating them for it."
Contact Ian with AI stories via email, ian.krietzberg@thearenagroup.net, or Signal 732-804-1223.
Related: The ethics of artificial intelligence: A path toward responsible AI
Get exclusive access to portfolio managers’ stock picks and proven investing strategies with Real Money Pro. Get started now.