Get all your news in one place.

100’s of premium titles.
One app.

Start reading

Get all your news in one place.

100’s of premium titles. One news app.

Start reading

Fortune

Sharon Goldman

OpenAI's AGI safety team has been gutted, says ex-researcher

OpenAI Daniel Kokotajlo X (Organization) Jan Leike Sam Altman Fortune Meta

Nearly half the OpenAI staff that once focused on the long-term risks of superpowerful AI have left the company in the past several months, according to Daniel Kokotajlo, a former OpenAI governance researcher.

OpenAI, the maker of AI assistant ChatGPT, is widely regarded as one of a handful of companies in the vanguard of AI development. Its mission, according to the company's founding charter, is to develop a technology known as artificial general intelligence, or AGI, in a way that "benefits all of humanity." OpenAI defines AGI as autonomous systems that can perform the most economically valuable work, like humans currently do.

Because such systems might pose significant risks including, according to some AI researchers, the possibility that they would escape human control and pose an existential threat to all of humanity, OpenAI has employed since its founding a large number of researchers focused on what is known as "AGI safety"—techniques for ensuring that a future AGI system does not pose catastrophic or even existential danger.

It's these group of researchers whose ranks Kokotajlo says have been decimated by recent resignations. The departures include Jan Hendrik Kirchner, Collin Burns, Jeffrey Wu, Jonathan Uesato, Steven Bills, Yuri Burda, Todor Markov, and cofounder John Schulman. Their exits followed the high-profile resignations in May of chief scientist Ilya Sutskever and Jan Leike, another researcher, who together co-headed what the company called its "superalignment" team. In announcing his resignation on the social media platform X, Leike said safety had increasingly “taken a backseat to shiny products" at the San Francisco AI company. (The superalignment team was supposed to be working on ways to control "artificial superintelligence," a technology even more speculative than AGI that would entail autonomous systems more capable than the collective intelligence of all humans combined.)

Kokotajlo, who joined OpenAI in 2022 and quit in April 2024, told Fortune in an exclusive interview that there has been a slow and steady exodus in 2024. Of about 30 staffers who had been working on issues related to AGI safety, there are only about 16 left.

"It's not been like a coordinated thing. I think it's just people sort of individually giving up,” Kokotajlo said, as OpenAI continues to shift toward a product and commercial focus, with less emphasis on research designed to figure out how to ensure AGI can be developed safely. In recent months, OpenAI hired Sarah Friar as CFO and Kevin Weil as chief product officer, and last week the company brought on former Meta executive Irina Kofman to head up strategic initiatives.

The departures matter because of what they may say about how careful OpenAI is being about the possible risks of the technology it is developing and whether profit motives are leading the company to take actions that might pose dangers. Kokotajlo has previously called the Big Tech's race to develop AGI "reckless."

An OpenAI spokesperson said that the company was "proud of our track record providing the most capable and safest AI systems and believe in our scientific approach to addressing risk." The spokesperson also said that the company agreed that "rigorous debate" concerning possible AI risks is "crucial" and that it would "continue to engage with governments, civil society and other communities around the world."

Even though AGI safety researchers were primarily concerned with how to control future AGI systems, some of the most effective ways of better controlling the large language models that underpin today's existing AI software—to ensure they don't use racist or toxic language, write malware, or provide users with instructions for making bioweapons—have come from researchers working on AGI safety.

While Kokotajlo could not speak to the reasoning behind all of the resignations, he suspected that they aligned with his belief that OpenAI is “fairly close” to developing AGI but that it is not ready “to handle all that entails.” That has led to what he described as a “chilling effect” within the company on those attempting to publish research on the risks of AGI and an “increasing amount of influence by the communications and lobbying wings of OpenAI” over what is appropriate to publish.

OpenAI did not respond to a request for comment.

Kokotajlo said his own concerns about the changing culture at OpenAI began before the boardroom drama in November 2023, when CEO Sam Altman was fired and then quickly rehired. At that time, three members of OpenAI’s board focused on AGI safety were removed. “That sort of sealed the deal. There was no turning back after that,” he said, adding that while he had no access to what was going on behind the scenes, it felt like Altman and president Greg Brockman (who recently took an extended leave of absence) had been “consolidating power” since then.

“People who are primarily focused on thinking about AGI safety and preparedness are being increasingly marginalized,” he said.

However, many AI research leaders, including former Google Brain cofounder Andrew Ng, Stanford University professor Fei-Fei Li, and Meta chief scientist Yann LeCun, consider the AI safety community's focus on AI’s purported threat to humanity to be overhyped. AGI is still decades away, they say, and AI can help solve the true existential risks to humanity, including climate change and future pandemics. In addition, they maintain that the hyper-focus on AGI risk, often by researchers and organizations funded by organizations with ties to the controversial effective altruism movement that is strongly focused on the “existential risk” of AI to humanity, will lead to laws that stifle innovation and punish model developers, rather than focusing on applications of AI models.

These critics say an example of such a law is California’s SB 1047, which was backed by groups with ties to effective altruism and which has sparked fierce debate ahead of a final legislative vote expected this week. The bill aims to put guardrails on the development and use of the most powerful AI models.

Kokotajlo said he was disappointed, but not surprised, that OpenAI came out against SB 1047. Along with former colleague William Saunders, he penned a letter last week to the bill's sponsor, state Sen. Scott Wiener, that said OpenAI's complaints about the bill "do not seem in good faith."

“In some sense, this is a betrayal of the plan that we had as of 2022,” Kokotajlo told Fortune, pointing to what he called an effort by OpenAI and across the AI industry to evaluate the long-term risks of AGI and get voluntary commitments about how to react to dangerous thresholds being crossed, and use those results as “inspiration and a template for legislation and regulation."

Still, Kokotajlo does not regret joining OpenAI in the first place. "I think I learned a bunch of useful stuff there. I feel like I probably made a positive difference," he said, though he regrets not leaving earlier. "I had already started to consider it before the board crisis, and in retrospect, I should have just done it."

He has friends continuing to work on AGI safety at OpenAI, he added, and said that even though OpenAI’s superalignment team was dissolved after Leike’s departure, some of those who remain have moved to other teams where they are permitted to work on similar projects. And there are certainly other people at the company focused on the safety of OpenAI's current AI models. After dissolving the Superalignment team, in May OpenAI announced a new safety and security committee "responsible for making recommendations on critical safety and security decisions for all OpenAI projects." This month, OpenAI named Carnegie Mellon University professor Zico Kolter, whose work focuses on AI security, to its board of directors.

But for those left at the company, Kokotajlo warned against groupthink amid the race by the largest AI companies to develop AGI first. "I think part of what's going on…is that, naturally, what's considered a reasonable view at the company is shaped by what the majority thinks, and also by the incentives that the people are under,” he said. “So it's not surprising that the companies end up concluding that it's good for humanity for them to win the race to AGI—that’s the conclusion that is incentivized."

Update Aug. 27: This story has been updated to include statements from an OpenAI spokesperson.

Read news from 100’s of titles, curated specifically for you.

Already a member? Sign in here