Get all your news in one place.
100’s of premium titles.
One app.
Start reading
TV Tech
TV Tech
Jenny Priestley

OpenAI Introduces New Tool to Create Video From Text

ChatGPT.

OpenAI, the company behind ChatGPT, has introduced a new tool that uses generative AI to create videos from text.

According to OpenAI, Sora generates a video by starting with one that looks like static noise and gradually transforms it by removing the noise over many steps.

The tool is capable of generating entire videos all at once or extending generated videos to make them longer, said the company. By giving the model foresight of many frames at a time, it claims to have solved the issue of making sure a subject stays the same even when it goes out of view temporarily.

So far Sora is only available to a few researchers and video creators, however the company has showcased its capabilities on X, formerly known as Twitter.

According to a blog post from the company, Sora takes inspiration from large language models which acquire generalist capabilities by training on internet-scale data.

The success of the LLM paradigm is enabled in part by the use of tokens that unify diverse modalities of text—code, math and various natural languages, said the post.

Screenshot from video created by OpenAi using the prompt: The camera rotates around a large stack of vintage televisions all showing different programs — 1950s sci-fi movies, horror movies, news, static, a 1970s sitcom, etc, set inside a large New York museum gallery (Image credit: OpenAI)

Instead of using text tokens, Sora has visual patches, said OpenAI. “We find that patches are a highly-scalable and effective representation for training generative models on diverse types of videos and images,” it added.

The technology turns videos into patches by first compressing them into a lower-dimensional latent space, and subsequently decomposing the representation into spacetime patches, said the post.

OpenAI has “trained” a network that reduces the dimensionality of visual data. It takes raw video as input and outputs a “latent representation that is compressed both temporally and spatially”.

Sora is trained on and subsequently generates videos within this compressed latent space.

This article originally appeared on TV Tech sister brand TVBEurope

Sign up to read this article
Read news from 100’s of titles, curated specifically for you.
Already a member? Sign in here
Related Stories
Top stories on inkl right now
Our Picks
Fourteen days free
Download the app
One app. One membership.
100+ trusted global sources.