Google announced the launch of Gemini AI yesterday, the company’s latest entry into generative AI. The much anticipated launch marks the company’s biggest attempt to take on OpenAI’s ChatGPT, which debuted this same week a year ago, igniting the AI craze that has dominated 2023.
In a blog post, Google and Alphabet CEO Sundar Pichai, emphasized the company’s principles in its approach to AI, noting the opportunities as well as the adverse effects the technology could lead to.
Noting that they’ve only “scratched the surface” when it comes to AI, Pichai said that “we’re approaching this work boldly and responsibly. That means being ambitious in our research and pursuing the capabilities that will bring enormous benefits to people and society, while building in safeguards and working collaboratively with governments and experts to address risks as AI becomes more capable.”
Unlike ChatGPT, which currently works only with text, Gemini 1.0 has been trained on Google’s data archive and works with text, images, and video. Pro, the first version of Gemini 1.0, has already been integrated into Google’s Bard chatbot for English and will eventually be available in more than 170 countries and territories.
Gemini will become available to developers on Google’s Cloud API starting Dec. 13. Three versions will eventually become available: Pro, which is being deployed this week, outscored an earlier version of ChatGPT (3.5), on six out of eight commonly used benchmarks for testing the capabilities of AI software, according to Google. Ultra, the top of the line version, anticipated for use by data centers and enterprise applications, will be rolled out in 2024; and Nano, a version for Google’s Android platform, encompassing Google Pixel smartphones. Other new products will be introduced in the future as long as they pass what Google says are “extensive trust and safety checks.”
Bard Advanced, a new version of the Bard chatbot is expected to be rolled out in 2024.
With its “multimodal” capabilities, which add the ability to analyze images and video to the text-based AI platforms already available, the launch is expected to up the ante in the AI wars, according to The Verge:
“Gemini’s clearest advantage comes from its ability to understand and interact with video and audio. This is very much by design: multimodality has been part of the Gemini plan from the beginning. Google hasn’t trained separate models for images and voice, the way OpenAI created DALL-E and Whisper; it built one multisensory model from the beginning.”
For more insight on this launch, visit our sister brand, Tom’s Guide.