ChatGPT has a new challenger to look out for in the form of Claude 2, billed by its creators as a ‘friendly, enthusiastic colleague or personal assistant.’
San Francisco-based AI company Anthropic has launched the second generation of its natural language generation chatbot, hitting the market with the bold claim it's smarter than 90% of the college students in the United States and Canada.
According to Anthropic, Claude 2 performs better, can give longer responses, and has improvements when it comes to coding, math, and reasoning. Claude 2 can also handle 100 tokens of input or output, equating to up to 75,000 words of information, which means it can process and generate anything from simple questions to complex reports.
In the real world, Anthropic says Claude 2 scored 76.5% on the multiple choice section of the Bar exam, up from 73.0% with Claude 1.3, and scores above the 90th percentile on the GRE reading and writing exams compared to college students applying to graduate school.
I tried it
Using Claude 2 is pretty simple and given Anthropic want to license the chatbot to businesses, maybe that shouldn't be surprising. However in the limited time I've used it, Claude 2 was able to quickly answer my questions.
It also seemed to remember how I like my information to be presented, opting for a series of bullet points for all future questions after I asked it to simplify a lengthy answer.
Claude 2 has apparently improved its coding skills, scoring 71.2% on the Python coding test, the Codex HumanEval, whereas the first generation could only reach 56.0%. On GSM8k, a large set of grade-school math problems, Claude 2 scored 88.0% up from 85.2%.
Anthropic also said “an exciting roadmap of capability improvements” is planned for Claude 2. If you want to test these capabilities, Anthropic launched a public beta of Claude 2 for residents in the U.S. and U.K.
Overall the experience is simple. The answers I received were correct, and personalized, but it will interesting to see how far we can test Claude with more time.
New safety techniques
The new chatbot is one of the first to take advantage of the new safety techniques that Anthropic announced over the last year, in order to improve results and prevent dangerous usages of generative AI.
These techniques include training models with reinforcement learning from human feedback (RLHF) so that Claude 2 has the capability to "morally self-correct", as well as Constitutional AI, which can identify an inappropriate request and explain why it will not engage with the request.
As a result, Claude 2 is two times better at giving harmless responses than Claude 1.3, its predecessor.
The effort to make chatbots safer to use may please Government bodies who are increasingly focused on potential risks from the rise of generative AI models like chatbots and deepfakes.
The White House has outlined a blueprint for AI regulation covering areas like bias, privacy and misinformation. In Europe, the E.U, has proposed stricter rules for high-risk AI systems, with transparency and oversight safeguards, and the FTC has also warned companies to prevent deception from generative AI, citing truth-in-advertising laws.
Regulators in the U.K., Australia and elsewhere have also raised concerns about personal data used to train models and misuse generating harmful content, with lawmakers considering new laws to prevent spreading misinformation, impersonation fraud and other harms.
Accuracy has been a focus for lawmakers and regulators since the release of ChatGPT in late 2022, and amassed over a million users who had signed up to try the AI chatbot. Inspired by its popularity, other tech giants have launched their own massive generative language models like Google's Bard, SoundDrown from Meta, and Bing Chat from Microsoft. Smaller startups are also joining the race to offer creative generative AI apps for writing, art and coding.