Artificial intelligence has proved more than adept at some of the most complex games and puzzles, beating humans in chess, poker and Go. Now one YouTuber is trying to teach generative AI and AI vision tools to play Super Mario 64.
Josh Bickett created a new AI framework that allows GPT-4-Vision, the model behind ChatGPT, to play a range of games and started with the Nintendo classic.
While it “could” play the game, it didn’t do a particularly good job due to delays in processing the information. What it did demonstrate was the power of AI vision.
AI vision is a rapidly growing area, capable of looking at the real world and analyzing what is happening, then making decisions based on what it can see. This approach is proving particularly useful in robots and we’ve even seen it in smart cat flaps at CES.
How does ChatGPT play Super Mario?
Right now you can’t just say to ChatGPT “play SuperMario 64 and win”, it isn’t that clever, but the models it is built on have the potential to do so much more than they can inside the chatbot.
Using the AI vision model Bickett created a multimodal gaming framework. It works by looking at the screen and working out what it can see, then directing the controls.
The multimodal gamer framework takes a screenshot of the game, makes a decision on what to do next and then directs the action using controllers.
Testing using a web emulation of Super Mario 64 it was able to detect where Mario was on screen, the path and to tell Mario to move forward on the path. It can also determine how long to hold the key to make it move, make it jump and dodge objects.
How well does ChatGPT play Super Mario?
The biggest problem is lag. There is a long gap between GPT-4-Vision observing the screen and making a decision in what to make Mario do, often resulting in being hit by a bad guy.
"These models have a bit of latency and I found that was the major issue in how it navigates and makes decisions. It would be interesting if latency was non-existent how this model would do," Bickett questioned.
At the end of the video, after several iterations of the code during testing, the AI was able to move about, jump and interact but it wasn't perfect. It made some bad choices and looked more like a toddler hitting buttons rather than a true gamer playing.
What does this mean for AI gaming?
This is just version one of the tool and uses an AI model running in the cloud. It caused latency issues that led to delays in decision making. It also wasn’t fine-tuned on Super Mario 64.
In the future, as the technology improves, we could see true AI-powered walkthroughs, players and guides for any game in your library.
This could be powered by a custom local AI vision models running on a local NPU such as the ones in the new Intel Core Ultra chips, finally presenting a use case for the new generation of AI PCs.