Whether you're fully immersed in AI tech or have only heard snippets of top conversations, you've likely heard of OpenAI's ChatGPT. Despite its widespread popularity, GPT-4o, Open AI's best model to date, is now less powerful than a new model from competing company Anthropic: Claude 3.5 Sonnet.
In benchmarks shared by Anthropic on X , Claude 3.5 Sonnet outperforms OpenAI's GPT-4o model in every AI benchmark except math problem-solving.
Introducing an upgraded Claude 3.5 Sonnet, and a new model, Claude 3.5 Haiku. We’re also introducing a new capability in beta: computer use.Developers can now direct Claude to use computers the way people do—by looking at a screen, moving a cursor, clicking, and typing text. pic.twitter.com/ZlywNPVIJPOctober 22, 2024
Anthropic says Claude 3.5 Sonnet offers "across-the-board improvements over its predecessor," which is fantastic to hear, but it's definitely not the most interesting tidbit.
What's most fascinating about Claude 3.5 Sonnet is what Anthropic calls a "groundbreaking new capability." This new feature, available for anyone to test in public beta, is called 'Computer Use' — and it's the closest an AI model has ever been to delivering an actual virtual assistant to help us with monotonous tasks.
What is 'Computer Use,' and how does it work?
Anthropic says, "developers can direct Claude to use computers the way people do —by looking at a screen, moving a cursor, clicking buttons, and typing text. And while the company notes it's "still experimental" and "error-prone," the demos I've seen so far have been impressive.
Via Rowan Cheung on X , you can catch a breakdown of how Claude 3.5 Sonnet can take over your screen, move your cursor, type by itself, and carry out complex tasks, like creating a website or filling in a vendor request form with relevant information.
Anthropic just announced Computer Use It allows Claude to control your computer screen based on a prompt and take actions on your behalf The use cases in agentic coding with automated debugging, customer support, and education are going to be INSANEpic.twitter.com/75WUDjjuGWOctober 22, 2024
Cheung explains that the 'Computer Use' feature "works by taking static screenshots that are constantly sent back to the API in real-time." In an Anthropic blog post shared with TechCrunch, the company expanded on how it works, saying "Claude looks at screenshots of what’s visible to the user, then counts how many pixels vertically or horizontally it needs to move a cursor in order to click in the correct place."
Seeing the AI model working in real time is truly something, and if you're worried about the potential security risk for your personal computer, here's what testing has been done so far: Pre-deployment testing of Claude 3.5 Sonnet was conducted by both the US AI Safety Institute (US AISI) and the UK Safety Institute (UK AISI), and the company deemed the ASL-2 Standard appropriate for the model.
That said, it's certainly not immune to security risks, and you probably shouldn't be using an AI model with any sensitive, private data.
It'll be interesting to see how the feature evolves as it emerges from the public beta version to its official version, but as of right now, I'm excited for its potential to help with large, tedious copy/paste tasks and more complex requests.