What you need to know
- Korean scientists recently developed a new AI image generation model called KOALA.
- Unlike other models like Microsoft's Image Creator from Designer, the tool flaunts faster image generation speeds.
- It leverages a new technique dubbed knowledge distillation, which compresses the size of an open-source image generation tool called Stable Diffusion XL.
- This way, it can generate images faster, even on old PCs with outdated GPUs.
A new AI-powered image generator is on the horizon and could potentially take on Microsoft's Image Creator from Designer (formerly Bing Image Creator), Midjourney, and OpenAI's DALL-E 3 model.
The new tool can generate images in less than two seconds, significantly faster than your average image generation tool. According to a spot by Live Science, the South Korean scientists behind this new invention leveraged a new technique dubbed knowledge distillation, which compresses the size of an open-source image generation tool called Stable Diffusion XL.
How does this AI tool work?
For context, Stable Diffusion XL features up to 2.56 billion parameters. As you might already know, AI heavily relies on existing content, including images, for training. This large set of parameters explains why generating images might take a bit of time. However, with this new technique, the scientists cut down the parameters for its smallest model, KOALA, to 700 million.
As such, the tool can generate images in a split second. The image generation model doesn't require high-end GPUs and sophisticated devices to run smoothly. It only requires about 8GB of RAM to generate images. Essentially, the knowledge distillation technique sieves information from the large model to the smaller one without affecting the quality or performance. This way, the smaller model is capable of generating quality images faster.
RELATED: Microsoft's Image Creator's image generation speed is excruciatingly painful
According to benchmarks shared by the scientists, KOALA is significantly faster than OpenAI's DALL-E 3 or DALL-E 2 models. When prompted to generate "a picture of an astronaut reading a book under the moon on Mars," the former took 13.7 seconds and the latter 12.3 seconds. KOALA only took 1.6 seconds to generate the image.
There are five versions of KOALA. Three versions of the model generate images based on text prompts, while the remaining two versions (Ko-LLaVA) can generate both images and videos (much like OpenAI's Sora model).
The Korean scientists from the Electronics and Telecommunication Research Institute (ETRI) shared their work and findings in the open-source AI repository Hugging Face and the arXiv database.
The scientists intend to integrate these models across existing image generation services, content production, and more.