The Reflection 70B model held huge promise for AI but…

The Reflection 70B model held huge promise for AI but now its creators are accused of fraud — here's what went wrong

Adobe Firefly AI image of a Llama looking in a mirror.

The creators of Reflection 70B, a tuned-up version of Meta Llama 70B that was recently touted as the world’s top open-source AI model, have just opened up after being accused of fraud.

Based on independent tests run by Artificial Analysis, the model fails to deliver on the promises made by Matt Shumer, CEO of OthersideAI and HypeWrite, the company behind Reflection 70B. Shumer, who initially attributed the discrepancies to an issue with the model’s upload process, has since admitted that he may have gotten ahead of himself in the claims he had made.

But critics in the AI research community have gone as far as accusing Shumer of fraud, stating that the model is just a thin wrapper based on Anthropic’s Claude, rather than a tuned-up version of Meta Llama.

Discrepancies emerge after third-party evaluation

I got ahead of myself when I announced this project, and I am sorry. That was not my intention. I made a decision to ship this new approach based on the information that we had at the moment.I know that many of you are excited about the potential for this and are now skeptical.…September 10, 2024

Developed by New York startup HyperWrite AI, Reflection 70B was touted as "the world's top open-source model" by Matt Shumer, the company’s CEO.

Yet on September 7, a day after Shumer’s announcement on X, Artificial Analysis reported that their evaluation of Reflection 70B yielded results significantly lower than Shumer's claims. Shumer attributed these to an upload error affecting the model's weights, which caused a discrepancy between Shumer’s private API and the weights uploaded to Hugging Face’s model repository.

However, further analysis by the AI community on platforms like Reddit and Github suggested that Reflection 70B’s performance mirrors closer to Meta Llama 3 rather than Llama 3.1, as claimed by Shumer. Suspicions were raised further when it was found that Shumer had an undisclosed vested interest in Glaive, the platform he claimed was used to generate the model's synthetic training data.

Some went on to suggest that Reflection 70B was merely a "wrapper" built on top of Anthropic's proprietary AI model, Claude 3. On September 8, X user Shin Megami Boson publicly accused Matt Shumer of “fraud in the AI research community.”

HypeWrite breaks silence following fraud accusations

I want to address the confusion and valid criticisms that this has caused in the community. I am currently investigating what happened that led to this and will share a transparent summary as soon as possible. There are two areas I’d like to address, which I am investigating:-… https://t.co/NSjx6oqPRoSeptember 10, 2024

After initially going silent as the controversy erupted, Shumer issued a public response through X on September 10, acknowledging the skepticism around the model’s performance. He claimed a team was working to understand what went wrong and promised transparency once they had the facts.

However, Shumer did not provide a clear explanation for the performance discrepancies. Sahil Chaudhary, founder of Glaive, the platform Shumer said was used to train Reflection 70B, also admitted uncertainty about the model's capabilities and that the touted benchmark scores had not been reproducible.

Critics have remained unsatisfied with Shumer's response so far. "Shumer's explanations and apologies have failed to provide a satisfactory explanation for the discrepancies," reported analytics firm GlobalVillageSpace. Yuchen Jin, co-founder of Hyperbolic Labs, expressed disappointment in the lack of transparency and called for more thorough explanations from Shumer.

More from Tom's Guide

Read news from 100’s of titles, curated specifically for you.

Already a member? Sign in here