Another day, another massive leap forward in AI image generation. With Midjourney and OpenAI's DALL-E both getting big upgrades late last year, Stability AI's open-source Stable Diffusion needed to pull off something special to remain relevant. And it looks like it's done just that.
Stable Diffusion 3, which is still only available as a preview via a waiting list, uses a whole new diffusion transformer architecture and flow matching. Public access is very limited, but based on sample images and the few results shared by those who have access, it looks like another big advance in several areas. And it seems we're going to have to update our guide to the best AI art generators.
One of the big improvements in Midjourney V6, DALL-E 3 and Google Imagen 2 was a much more consistent handling of text. They can recognise when we're including instructions for text in prompts and can render it correctly... sometimes. Basically, Stable Diffusion was left trailing way behind as the only major AI image generator that still couldn't spell. That appears to have been fixed in the upgrade to Stable Diffusion 3.
But there's more. Image quality also seems to have improved. But what looks most impressive of all so far is the new model's adherence to complex text prompts; that is, prompts that ask it for several specific elements rather than just a 'cat wearing a hat'. This could make Stable Diffusion 3 a more viable option for realising more specific creative visions. It should also make inpainting – the editing of sections of the initial image to swap out elements – more reliable.
YouTuber MattVidPro AI described Stable Diffusion 3 as "easily the most capable AI image generator we have seen to date". He says it beats DALL-3 in prompt understanding. His comparisons aren't based on his own use, however. He compares the sample images provided by Stability AI with his own test results from DALL-E 3 and Midjourney. Naturally, we presume that Stability AI has shared the best images it could produce, possibly after many many attempts.
All Your Tech AI has also made a video comparing Stability AI's Stable Diffusion 3 sample images with results from other AI image generators, reaching largely similar conclusions. He also explains why this adherence to complex prompts is so important for making AI image generations actually useful for creatives.
Others with access to the model have shared examples of images showing that hand positions and other instructions are also more reliable in SD3.
Hand positions that are a pain to get on other models are working 99% of the time on #SD3. And in multiple styles. pic.twitter.com/7RHxQFGJ5nFebruary 24, 2024
Emad Mostaque, the founder and CEO of Stability AI, has been sharing images and video montages of output from SD3 on X. He's promising that there are more improvements to come in the full release, including more control over composition. Intriguingly, he also hints at collaborative features.
After you get great base models like #SD3 what comes next? Control, composition, collaboration.. More soon..@Nitrosocke pic.twitter.com/bZh96TZbCyFebruary 22, 2024
Yeah new version of #SD3 in a few days then we start inviting folk in to improve it further https://t.co/VfE100hpPCFebruary 24, 2024
The latest update shows that the open-source approach of Stability AI is still holding its own against the paid-for Midjourney and DALL-E 3. Once its API becomes available, other developers are sure to start using Stable Diffusion 3 in their own AI image generator apps in the way they have with the previous model.
The upshot is that the AI arms race shows no signs of abating. With four big players continually outdoing each other with each new update, none of them has the option of slowing down. That's great for those who argue that AI image generation is a huge tool for creative work, and apocalyptic for those who fear it means the end of all human creativity.
For more AI news, also check our the first early glimpses of OpenAI's Sora AI for video.