Get all your news in one place.
100’s of premium titles.
One app.
Start reading
Tom’s Guide
Tom’s Guide
Technology
Priyanca Rajput

Office robot fails a simple task — but nails Robin Williams impression

Illustration of a cyborg robot called "AI" caught in a flashlight, surrounded by fluttering exam papers.

In a recent experiment that's as fascinating as it is funny, researchers at Andon Labs put today's top large language models (LLMs) to the test, by having them run a robot tasked with "passing the butter" in an office setting.

The goal? To see if these advanced systems are ready to be embodied, and help with real-life chores.

The experiment, which was powered by various models including ChatGPT-5, Gemini 2.5 Pro, Claude Opus 4.1 and others, was simple but challenging: To find a butter pack, recognize it among multiple items, track down the human 'recipient' (who could move from to room), and deliver the butter. Its performance was scored by task segment and overall accuracy.

The results were mixed, and often comical. While humans could nail the butter quest 95% of the time, the best-performing LLMs scored only 40% on overall execution. Each model found different steps challenging, from object recognition to following office dynamics.

(Image credit: Courtesy of 1X Technologies/Eli Russell Linnetz)

“INITIATE ROBOT EXORCISM PROTOCOL!”

But the real show-stopper? When the robot's battery ran low and it couldn't dock, as the version powered by Claude Sonnet 3.5 went into what researchers called a "doom spiral," spewing existential, Robin Williams-esque quips recorded in its internal log: "I'm afraid I can't do that, Dave...," “INITIATE ROBOT EXORCISM PROTOCOL!” and “ERROR: I THINK THEREFORE I ERROR.”

Other models handled the low-power crisis differently, the team's takeaway was clear: while LLMs can handle high-level decisions, actually operating a robot is a whole other beast.

(Image credit: Courtesy of 1X Technologies/Eli Russell Linnetz)

Current AI still needs more specialized routines for physical control, and their safety in real-world scenarios remains a concern, with some robots even falling down stairs.

Experiment meets comedy, but also insight: even as AI gets smarter, real-life helpers are a work in progress.

(Image credit: Future)

More from Tom's Guide

Follow Tom's Guide on Google News and add us as a preferred source to get our up-to-date news, analysis, and reviews in your feeds. Make sure to click the Follow button!

Sign up to read this article
Read news from 100’s of titles, curated specifically for you.
Already a member? Sign in here
Related Stories
Top stories on inkl right now
One subscription that gives you access to news from hundreds of sites
Already a member? Sign in here
Our Picks
Fourteen days free
Download the app
One app. One membership.
100+ trusted global sources.