OpenAI’s Sora AI video generator is an impressive piece of technology, capable of producing stunning visuals and complex clips — but it might not be ready to replace Hollywood just yet.
One of the most high profile uses of Sora since its unveiling in February is a short film by Canadian production studio Shy Kids called Air Head. It features a man with a balloon for his head talking about his life and issues stemming form having a literal head of air.
While it was clear the audio was human generated, and the video edited using Sora clips, it was insinuated the clips were “as generated” by Sora.
It seems that isn’t the case. In an interview with FXGuide Patrick Cederberg from the Shy Kids team revealed they had to turn to traditional VFX techniques to fix consistency issues with the shots generated by OpenAI’s tool.
While in the whole this isn’t actually an issue, and most likely reflects how AI video will be used in the filmmaking process — the lack of transparency doesn’t show OpenAI in a good light.
How was Air Head actually produced?
OpenAI has been reluctant to provide wide-scale access to its AI video generator. This is in part due to the cost and time of generating a single clip, but also over safety issues with potential use cases for very high quality synthetic content in an election year.
Shy Kids were one of a dozen or so creative professionals offered the chance to test drive Sora, accessed through a ChatGPT-like interface and with guardrails around copyright in place.
Ceberberg, who handled post-production on Air Heads told FXGuide: “It’s a very, very powerful tool that we’re already dreaming up all the ways it can slot into our existing process. But I think with any generative AI tool; control is still the thing that is the most desirable and also the most elusive at this point.”
A few examples of this were unwanted faces appearing on the balloon, or the yellow balloon showing up as red. In one of the most prominent scenes where the character is chasing his head across a courtyard they had to turn to rotoscoping in Adobe AfterEffects.
In the Sora clip the man had a head and the balloon was red. To solve it they painted out the head and changed the color of the balloon in AfterEffects as they couldn't get it to render in Sora exactly as needed.
What is it like working with Sora
According to Cederberg it is not a fast process. Clips can take up to 20 minutes to render regardless of how long they are and this increases when demand for server time is high.
“We would generally do that because if you get the full 20 seconds, you hope you have more opportunities to slice/edit stuff out and increase your chances of getting something that looks good,” he told FXGuide.
Sora is accessed through a ChatGPT interface, with prompts refined by the OpenAI chatbot before being sent off to the video generator. He said they’d often have to use long and very descriptive prompts to ensure consistency and even then it wasn’t always possible.
To make it work, they approached producing Air Head like a documentary instead of a short film. Essentially they worked with a massive amount of generated material then crafted that into a story, rather than writing a script and filming to that script. This is because you didn't always know what shots you'd be able to create.
It seems Sora also has the same problem as exiting AI generators like Runway or Pika Labs in that clips appear significantly slower than real footage. Cederberg said: "There was quite a bit of adjusting timing to keep it all from feeling like a big slowmo project.”
What does this mean for AI video?
Remember that 'air head' video made with Sora? Turns out it used a ton of rotoscoping and manual VFX. A 'head' would pop back on, and the balloon colors would keep changing from generation to generation. TL;DR researchers and developers of generative AI tools really need to… pic.twitter.com/nHP5nCe0PPApril 25, 2024
In reality, for some time, AI generated content will be used as part of a workflow by filmmakers rather than as a replacement to filmmaking itself. The work Adobe has done in integrating generative video into Premiere Pro is a good indication of what could happen.
The Shy Kids experience is likely at the extreme end of how AI video will be used. It also mirrors some comments made by LA-based director Paul Trillo, who used Sora to create a promo video for the next generation of TED Talks. He also said you needed a lot of clips to get the desired output, generating hundreds for a dozen that made the cut.
Cederberg sees Sora as a supplementary VFX tool as well as an extension of the normal process. Adobe proposes its use in generating B-roll or extending existing clips.
It seems that, like the rise of digital VFX and other groundbreaking technologies, Sora and AI video will lead to a new generation of films and possibly a new golden age of cinema.