Now that we have spent almost two years with the likes of ChatGPT, nudging the digital helpers to write our emails and cover letters, it’s time for the next stage of AI.
While we started out chatting to AI bots on dedicated websites and apps, they’ll soon be able to take over our laptops and complete those tasks themselves, with very little input from users.
Companies like Microsoft and ChatGPT maker OpenAI are already offering these tools, known as AI agents, to businesses. But, we’re now getting a glimpse of how they will work for the public at large with a new update from an AI firm called Anthropic.
The Amazon-backed company has released an upgraded AI model designed to use a computer like humans do. It can “look” at a screen, move a cursor, click buttons, type and open apps. According to Anthropic, the digital helper (dubbed Claude 3.5 Sonnet) can autonomously carry out tasks that require dozens, if not hundreds, of steps to complete.
To use the example shared by the firm, if you ask it to fill out a form with data from your computer and online, it will complete the entire task, including checking spreadsheets, opening web browsers, navigating to web pages, and inputting the relevant information.
As such, it sounds like a leap beyond our current interactions with AI bots, which typically require you to break tasks into smaller steps, asking the chatbot to handle one action at a time.
By contrast, the new model can understand and execute entire workflows from a single command, performing multiple tasks – like gathering data, navigating software, and filling forms – without needing users to intervene or give additional instructions at each stage.
In a demo video, the bot was shown successfully planning a sunrise trip to a scenic spot in San Francisco. Following the elaborate request, the bot calculated drive times and set up a calendar event with all the details.
Whether it can pull off the same feats outside of a heavily controlled environment is up for debate. For its part, Anthropic is trying to temper any outsized expectations by describing the feature as an “experiment” at launch.
Claude’s ability to use computers is also currently “imperfect”, with the bot struggling with basic actions like scrolling, dragging and zooming.
Another reason why Anthropic is playing it safe is security. The company, which already enforces strict, ethics-based guardrails for its bots, says concerns that AI-driven computer use could provide a new conduit for harmful activities (such as spam, misinformation, or fraud) prompted it to develop new safety tools. These are designed to detect when the technology is being misused or applied inappropriately.
Despite the bot’s teething issues, it has shown that it can comfortably outdo its rivals. In the OSWorld benchmark, which tests how well AI models can use computers like humans, Claude 3.5 Sonnet stood out. It scored 14.9 per cent in the screenshot-only category, nearly doubling the performance of the next best AI, which managed 7.8 per cent.
When given more steps to complete tasks, Claude's score improved to 22 per cent, showing it can handle complex, multi-step interactions better than its competitors.
For the time being, the computational use upgrade is only available as an API that allows developers to integrate it into their existing apps and services.
To that end, companies like Amazon, Canva, DoorDash, and the Browser Company are already using it to automate tasks, suggesting that it could pop up in everything, from food-delivery apps to web browsers in the future.