A group of researchers from China and Singapore recently published a paper detailing the challenge of getting an AI to play Red Dead Redemption II (RDR2). They also assessed and commented on the AI’s game-playing performance. In the paper Towards General Computer Control: A Multimodal Agent for Red Dead Redemption II as a Case Study (PDF) we learn about the concept of General Computer Control (GCC) for AIs, as well as a six-module agent framework dubbed CRADLE, used to interface between GPT-4V and RDR2. In their conclusion, major issues facing the AI gaming agent are laid at the door of the GPT-4V vision system.
According to the research paper, this RDR2 playing project provides insight into how far AIs have progressed to achieving Artificial General Intelligence (AGI). To this end, they basically try and get an AI, powered by OpenAI’s GPT-4V, to interact with a computer – taking in the visual and audio cues to intelligently use the computer, like the average computer-savvy human. Thus, they try to demonstrate that an AI can be successful at complex General Computer Control (GCC).
The researchers chose RDR2 as the game to put under the spotlight as they claim it has a “complex black box control system, which epitomizes the most demanding computer tasks and enables us to evaluate the performance boundaries of our framework in such virtual environments.” Indeed, it offers rich environments and diverse situations for players to navigate. Additionally, UI elements like dialogues, unique icons, in-game prompts, and instructions ensure no background knowledge is taken for granted – which is great for AI learning. Lastly, the researchers say that RDR2 game control via mouse and keyboard provides a better workout for GCC than most other software a computer user might run day-to-day.
Though the published paper focuses on RDR2, CRADLE is designed to be extended as part of its GCC purpose, “to support a broader spectrum of games, such as simulation and strategy games, as well as various software applications.” The key innovation here is the introduction of the CRADLE framework, so let’s look more closely at that now.
Above you can see an overview of how CRADLE handles the challenge of GCC gaming, specifically in RDR2. The researchers hoped to demonstrate CRADLE's ability to learn the game from scratch (without access to any internal game state or API) just like a human. Then, the AI agent was to progress in the game by navigating the world and completing tasks, following the main storyline in RDR2.
Overall, CRADLE seems to have been moderately successful in RDR2 gaming. The researchers say they assessed even representative tasks from the main storyline and open-ended missions. The key finding was that “CRADLE can complete all tasks in the main storyline consistently.” Some notable exceptions were: Protect Dutch which involves a fast-paced gun battle, Search House which requires the agent to explore a complex indoor environment, and the open-ended task with a long horizon.
You can see the importance of task inference and reflection in CRADLE, above. These refinements are especially important in the agent’s movement through the game and understanding when tasks are complete. During the study, some of the repeated difficulties experienced by CRADLE were blamed on GPT4-V. Specifically, it is claimed that “GPT-4V’s spatial-visual recognition capability is insufficient for precise fine-grained control.” Moreover, GPT4-V is said to struggle with domain-specific concepts, such as unique icons within the game, with understanding mini-maps, as well as with general obstacles in the game environment.
The full study can be read via this link, but we wish that the researchers had shared some video of RDR2 gameplay using their AI agent. We wonder how other multimodal AIs could perform in RDR2 via CRADLE?