Get all your news in one place.
100’s of premium titles.
One app.
Start reading
Fortune
Fortune
Sage Lazzaro

AI's big LLM makers all flunked a major transparency assessment

(Credit: In Pictures Ltd./Corbis via Getty Images)

Hello and welcome to Eye on AI. This week was a big one for AI research, and we’re going to start by diving into perhaps the most comprehensive attempt to interrogate the transparency of leading LLMs yet. 

The Stanford Institute for Human-Centered AI released its Foundation Model Transparency Index, which rates major foundational model developers to evaluate their transparency. Driven by the fact that public transparency around these models is plummeting just as the societal impacts of them are skyrocketing, the researchers evaluated 100 different indicators of transparency across how a company builds a foundation model, how that model works, and how it’s actually used. They focused on 10 major foundation model developers—OpenAI, Anthropic, Google, Meta, Amazon, Inflection, AI21 Labs, Cohere, Hugging Face, and Stability—and designated a single flagship model from each developer for evaluation. 

Eye on AI talked with one of the researchers behind the index to get a deeper understanding of how the companies responded to their findings, what it all means about the state of AI, and their plans for the index going forward, but first let’s get into the results. To sum it up, everyone failed. 

Meta (evaluated for LLama 2) topped the rankings with an “unimpressive” score of 54 out of 100. Hugging Face (BLOOMZ) came in right behind with 53 but scored a notable 0% in both the overall “risk” and “mitigations” categories. OpenAI (GPT-4) scored a 48, Stability (Stable Diffusion 2) scored a 47, Google (PaLM 2) scored a 40, and Anthropic (Claude 2) scored a 36. Cohere (Command), AI21 Labs (Jurassic-2), and Inflection (Inflection-1) spanned the mid-30s to low 20s, and Amazon (Titan Text) scored a strikingly low 12.

“We anticipated that companies would be opaque, and that played out with the top score of 54 and the average of a mere 37/100,” Rishi Bommasani, CRFM Society Lead at Stanford HAI, told Eye on AI. “What we didn’t expect was how opaque companies would be on critical areas: Companies disclose even less than we expected about data and compute, almost nothing about labor practices, and almost nothing about the downstream impact of their models.”

The researchers contacted all of the companies to give them a chance to respond after they came up with their first draft of the ratings. And while Bommasani said they promised to keep those communications private and wouldn’t elaborate on specifics like how Amazon responded to such a low score, he said all 10 companies engaged in correspondence. Eight of the 10 companies (all but AI21 Labs and Google) contested specific scores, arguing that their scores should be 8.75 points higher on average, and eventually had their scores adjusted by 1.25 points on average. 

The results say a lot about the current state of AI. And no, it wasn’t always like this. 

“The successes of the 2010s with deep learning came about through significant transparency and the open sharing of datasets, models, and code,” Bommasani said. “In the 2020s, we have seen that change: Many top labs don’t release models, even more don’t release datasets, and sometimes we don’t even have papers written about widely deployed models. This is a familiar feeling of societal impact skyrocketing while transparency is plummeting.”

He pointed to social media as another example of this shift, pointing to how the technology has become increasingly opaque over time as it becomes more powerful in our lives. “AI looks to be headed down the same path, which we are hoping to countervail,” he said. 

AI has quickly gone from specialized researchers tinkering to the tech industry’s next (and perhaps biggest ever) opportunity to capture both revenue and world-altering power. It could easily create new behemoths and topple current ones. The “off to the races” feeling has been intensely palpable ever since OpenAI released ChatGPT almost a year ago, and tech companies have repeatedly shown us they’ll prioritize their market competitiveness and shareholder value above privacy, safety, and other ethical considerations. There aren’t any requirements to be transparent, so why would they be? As Bommasani said, we’ve seen this play out before. 

While this is the first publication of the FMTI index, it definitely won’t be the last. The researchers plan to conduct the analysis on a repeated basis, and they hope to have the resources to operate on a quicker cadence than the annual turnaround most indices are conducted in or to mirror the frenetic pace of AI.

Programming note: Gain vital insights on how the most powerful and far-reaching technology of our time is changing businesses, transforming society, and impacting our future. Join us in San Francisco on Dec. 11–12 for Fortune’s third annual Brainstorm A.I. conference. Confirmed speakers include such A.I. luminaries as PayPal’s John Kim, Salesforce AI CEO Clara Shih, IBM’s Christina Montgomery, Quizlet’s CEO Lex Bayer, and moreApply to attend today!

And with that, here’s the rest of this week’s AI news.

Sage Lazzaro
sage.lazzaro@consultant.fortune.com
sagelazzaro.com

Sign up to read this article
Read news from 100’s of titles, curated specifically for you.
Already a member? Sign in here
Related Stories
Top stories on inkl right now
One subscription that gives you access to news from hundreds of sites
Already a member? Sign in here
Our Picks
Fourteen days free
Download the app
One app. One membership.
100+ trusted global sources.