StabilityAI reveals Stable Diffusion 3

StabilityAI reveals Stable Diffusion 3 — it does for AI images what Sora is doing for video

AI generated image by Stable Diffusion 3.

Stable Diffusion 3, the next generation of the popular open source AI image generation model has been unveiled by StabilityAI and it is an impressive leap forward.

Details of the new model were revealed alongside a series of image and prompts showing it is capable of following complex instructions and creating hyper realistic images.

This early preview of the model is only available to select group of testers while StabilityAI gathers feedback to improve performance and safety before a public release.

StabilityAI also used the Spawning "Do Not Train" registry to ensure that images from artists that did not want their work used to train AI was excluded. Over 1.5 billion images were filtered from the dataset before training.

What is Stable Diffusion 3?

Announcing Stable Diffusion 3, our most capable text-to-image model, utilizing a diffusion transformer architecture for greatly improved performance in multi-subject prompts, image quality, and spelling abilities.Today, we are opening the waitlist for early preview. This phase… pic.twitter.com/FRn4ofC57sFebruary 22, 2024

Unlike DALL-E, MidJourney or Google's Imagen Stable Diffusion is an open model that can be integrated into other platforms or even run locally if you have enough compute power.

SD3 will include a suite of models ranging from 800 million to eight billion paramaters allowing for different levels of quality and for operation on a wide range of hardware devices.

Like OpenAI's Sora Stable Diffusion 3 combines the diffusion model technology with the transformer architecture which could explain the improved instruction following capabilities.

It also uses flow matching which is a mathematical technique used to train diffusion models and involves measuring the difference betwern the real world images and the generated images at different stages of the process.

What can Stable Diffusion 3 do?

The prompt for this image was followed almost exactly. It was: “Photo of a red sphere on top of a blue cube. Behind them is a green triangle, on the right is a dog, on the left is a cat”. (Image credit: StabilityAI)

Few people outside of the development team have had direct access to Stable Diffusion 3 yet and the research paper has yet to be published, so what we know of its abilities are what the team have said and the output they have shared.

From what I can see of the images so far, it is a significant step change in generative images. It, alongside OpenAI's Sora, is an indication of a major upgrade in the way generative AI works and how well it works.

It appears to create consistent, extended and legible text on images, solves the problems around human anatomy including fingers, and captures color well.

Emad Mostaque, founder of StabilityAI said StabilityAI has 100x fewer resources for training AI models than the likes of OpenAI but are still achieving impressive work. He suggested that, like Sora, SD3 will be able to accept a range of inputs including video and image.

Details of SD3 come a few days after StabilityAI also unveiled Stable Cascade, a new technique for generating images that Mostaque says will work with SD3 in future.

More from Tom's Guide

Read news from 100's of titles, curated specifically for you.

Already a member? Sign in here