Just two weeks ago, Google researchers revealed the results of their experiments recreating Doom with AI using GameNGen. We were impressed, but it didn't exactly make it look like we'd be seeing AI game generation anytime very soon. But now the Chinese tech company Tencent has dropped a research paper that appears to lay the foundations for AI game engines.
Tencent's GameGen-O is described as the first diffusion transformer model tailored for the generation of open-world video games. The paper says that it allows high-quality, open-domain generation by simulating characters, dynamic environments, complex actions and events.
According to the paper on GitHub, GameGen-O involves data collection and processing from scratch from "over a hundred next-generation open-world games", employing a proprietary data pipeline for efficient sorting, scoring, filtering, and decoupled captioning. A two-stage training process consists of foundation model pretraining via text-to-video and video continuation, followed by instruction tuning using a trainable InstructNet, which enables the production of subsequent frames based on multimodal structural instructions.
The clips are short, and appear to be video more than games, kind of like a mini Sora designed to follow to control input prompts, and it doesn't seem that it's real-time generated, although that would be the ultimate aim. The paper says the training process gives the model with the ability to generate and interactively control content, representing an initial forward towards open-world video game generation via generative models as a possible alternative to rendering techniques.
Unsurprisingly, the development is provoking debate – and also some jokes. "We can make our own GTA 6 before GTA 6 launches with this pace of progress," one person suggested on Reddit.