One of the biggest issues with artificial intelligence-generated artwork is the resolution of the output. Even the best models only generate a 1MP image.
That's fine for social media, but if you want more than a small square in the corner of the page, it's not so good for printing. And the same issue applies if you want to extend a real photograph with AI.
In response, a team of researchers from the University of Surrey in the UK claim to have developed a technique that can generate images with 16 times the resolution of the big players like Midjourney, DALL-E 3, and Stable Diffusion’s SDXL 1.0.
The result is an AI image generator called DemoFusion, and it uses a relatively simple process to achieve those results — namely, it keeps running the generation process over and over until the quality improves. DemoFusion then stitches the underlying data together. It is also completely open source and can be run on a mid-tier gaming computer for free.
How DemoFusion compares to other AI image generators
DemoFusion, is based on the open source SDXL 1.0 from StabilityAI, a high-performance AI image generation model built on top of Stable Diffusion. SDXL 1.0 generates images up to 1024 x 1024, or 1MP.
Midjourney has done some work around upscaling, reaching 2048 x 2048 in beta testing, but the base model is still the same as both DALL-E 3 from OpenAI and SDXL 1.0.
None of these come close to the resolution of photos taken using a smartphone. The latest iPhone and high-end Android devices capture images of 48MP and beyond, which result in photographs of at least 8,000 pixels wide — easily big enough to print.
Several Android phones, including the Samsung Galaxy S23 Ultra and the Honor 90, go up to 200MP, or more than 14,000 pixels wide. DemoFusion boasts images up to 16 times that of SDXL 1.0, putting it at 256MP or 16,000 pixels in each direction.
How much does DemoFusion cost?
The developers of the new model are not only making it open source but also putting significant emphasis on “democratizing access to AI.” That includes making the model and all relevant details available for free to download and run locally.
I haven’t tried DemoFusion on my computer yet, but I have run a demo version on Replicate, using an Nvidia A100 chip, generating a series of images up to 13MP. This is enough to print the output at 300 pixels per inch on an 8 x 12 sheet of photo paper without losing quality.
The entire process, of generating an image of someone resembling Winston Churchill standing on a beach at 13MP resolution takes about 3 minutes. It will take longer running on a gaming rig with an Nvidia GPU or a MacBook running an M1, M2, or M3 chip but not by much.
What is DemoFusion's downside?
The biggest problem facing DemoFusion is time. Even running on the most expensive and powerful AI chips from Nvidia, it can take about 10 minutes to generate a higher resolution image, compared to seconds for the base SDXL 1.0 or Midjourney.
This will be even more pronounced when running on home computer hardware with gaming-quality Nvidia chips, explained Professor Yi-Zhe Song, director of the SketchX AI lab that is part of the Surrey Institute for People-Centred AI at the University of Surrey, where the model was developed.
“It took us by surprise when we realized the quality it was able to produce,” he told me during a short interview. But it took time to generate, and to combat this, researchers plan to explore building a new version on top of the recently announced SDXL Turbo model that starts with 512 x 512 pixel images — half the size of its big brother.
The other problem is that the DemoFusion model tends to wander and make changes the more times you run through. Anything after about nine times the starting resolution, you get significant deviation from where you started.
This is particularly obvious if you use an image as the prompt instead of a text prompt. To demonstrate, researchers showed a starting image of Mr. Bean who became a different person after nine run-throughs.
DemoFusion is all about democracy
“For us the real goal is getting this into the hands of the people, democratizing artificial intelligence and making it easier for creatives,” said Song. The next project is improving the control over the model, giving artists the ability to fine-tune each element of an image.
Song’s vision is one where an artist can create a rough sketch of say a hand or a bowl of fruit, then use AI to built it up piece-by-piece into a creative artwork, rather than simply entering a text prompt and leaving the AI to get on with it.
I found the quality of DemoFusion to be genuinely impressive. I saw print quality images from a simple text prompt as well as the ability to significantly improve the quality of existing images —- and it's all available to run on a local computer for free.
In terms of convenience, I doubt DemoFusion will give the big players much of a run for their money, as you really do need a reasonable quality gaming GPU, unless you’re prepared to wait hours for your image to generate. But DemoFusion is a good indicator of what is coming, and could lay the groundwork for better quality image generation in the likes of Midjourney and DALL-E in future.