Google just launched Gemini, which it describes as its “largest and most capable AI model.” The new large language model (LLM) will come in three sizes: Ultra, Pro, and Nano — data center down to mobile. Some of the biggest boasts about Gemini concern its accuracy and performance, as well as its native multimodal abilities.
Google CEO Sundar Pichai introduced Gemini in a blog post, and said a major goal of Gemini was “making AI more helpful for everyone.” Google has been investing heavily in Gemini behind the scenes, as headlines have been filled with news about advances in ChatGPT, and even Grok. The Gemini project has been “one of the biggest science and engineering efforts we’ve undertaken as a company,” acording to Pichai. Google has been investing in “the very best tools, foundation models, and infrastructure.”
The Google CEO highlighted the speed of change and momentum behind AI. “Millions of people are now using generative AI across our products to do things they couldn’t even a year ago,” he said. However, with great power comes great responsibility, and Pichai also delivered a strong message about being bold but responsible. To that end, Gemini will focus on delivering benefits — but with safeguards.
Gemini 1.0 comes in three sizes:
- Gemini Ultra — our largest and most capable model for highly complex tasks.
- Gemini Pro — our best model for scaling across a wide range of tasks.
- Gemini Nano — our most efficient model for on-device tasks.
Google also shared a video demonstrating some of the search giant’s “favorite interactions with Gemini.”
Demis Hassabis, CEO and Co-Founder of Google DeepMind, also contributed to the Gemini announcement blog post. Hassabis reflected on his background of developing AI in games during his teens, then as a neuroscience researcher, before his illustrious time at the helm of DeepMind. Hassabis mentioned one of his greatest wishes was to take AI away from being a software experience to being more akin to an expert helper or assistant.
Much of the talk about Gemini, and several demonstrations, centers on its multimodal capabilities. It was built from the ground up to have this ability. Its multimodality means it “can generalize and seamlessly understand, operate across, and combine different types of information including text, code, audio, image, and video.”
Google brags about Gemini's performance
Google was laser-focused on Gemini's capabilities and performance. The company shared some detailed benchmark results on its blog, showing Gemini has been rigorously tested and will give accurate results across a broad range of tasks and reasoning.
Google also bragged that “Gemini Ultra is the first model to outperform human experts on MMLU (massive multitask language understanding),” as evidenced by its 90% score in academic benchmarks which are widely used to grade LLMs. Google explained that MMLU “uses a combination of 57 subjects such as math, physics, history, law, medicine, and ethics for testing both world knowledge and problem-solving abilities.” Google also claimed Gemini will use its reasoning to think more carefully about answering difficult questions for “significant improvements” in results. We assume that means Google hopes Gemini won’t be as prone to hallucinations as a lot of contemporary LLM rivals.
In the performance table above, you will note that Google Deepmind confidently compared the new Gemini against OpenAI's GPT LLM. Open AI's solution is a yardstick against which all other challengers are inevitably judged, and you can see that Gemini compared extremely favorably in the AI benchmark tasks charted. Remember that GPT-4 is the newest and most capable iteration of the OpenAI LLM. Google highlighted the convincing triumph of Gemini across "30 of the 32 widely-used academic benchmarks used in large language model (LLM) research and development."
The Google blog also provided descriptive overviews, with accompanying videos, of Gemini being used for popular AI tasks like providing insights from a multitude of documents, understanding a wide range of media (text, video, audio, and more), and advanced coding.
Gemini is rolling out now
Google Gemini is rolling out now across the firm’s products and platforms. Starting today, Bard will be using “a fine-tuned version of Gemini Pro for more advanced reasoning, planning, understanding, and more.” Moreover, it will be available in English across more than 170 countries and territories.
Gemini Nano will debut on the Google Pixel 8 Pro. Apps like the Recorder, G-Board, and WhatsApp will get access to Gemini shortly, with more app support in the coming months. Last but not least, Gemini is also being prepared for integration into Search (SGE), Ads, Chrome, and Duet AI.