Flux is an artificial intelligence image generator released by AI startup Black Forest Labs in the past few weeks and it has quickly become one of the most powerful and popular tools of its kind, even giving market leader Midjourney a run for its money.
Unlike Midjourney, which is a closed and paid-for service only available from Midjourney itself, Flux is an open-source model available to download and run locally or on a range of platforms such as Freepik, NightCafe and Hugging Face.
To determine whether Flux has reached Midjourney levels of photorealism and accurate human depiction I’ve come up with 5 descriptive prompts and run them on both. I’m generating Flux images using ComfyUI installed through the Pinokio AI installer.
Creating the prompts
Both Midjourney and Flux benefit from a descriptive prompt. To get exactly what you want out of the model its good to describe not just the person but also the style, lighting and structure.
I’ve included each prompt below for you to try yourself and these should also work with Ideogram, DALL-E 3 in ChatGPT or other AI image platforms if you don’t have Midjourney or Flux but, except Ideogram, none reach the realism of Midjourney or Flux.
1. A chef in the kitchen
The first test combines the need to generate a complex skin texture with a dynamic environment — namely a professional kitchen. The prompt asks for a woman in her mid-50s in the middle of preparing a meal.
It also asks for the depiction of sous chefs in the background and for the chef's name to be shown on a "spotless white double-breasted chef's jacket".
A seasoned chef in her mid-50s is captured in action in a bustling professional kitchen. Her salt-and-pepper hair is neatly tucked under a crisp white chef's hat, with a few strands escaping around her temples. Her face, marked with laugh lines, shows intense concentration as she tastes a sauce from a wooden spoon. Her eyes, a warm brown, narrow slightly as she considers the flavor. The chef is wearing a spotless white double-breasted chef's jacket with her name embroidered in blue on the breast pocket. Black and white checkered pants and slip-resistant clogs complete her professional attire. A colorful array of sauce stains on her apron tells the story of a busy service. Behind her, the kitchen is a hive of activity. Stainless steel surfaces gleam under bright overhead lights, reflecting the controlled chaos of dinner service. Sous chefs in white jackets move purposefully between stations, and steam rises from pots on industrial stoves. Plates of artfully arranged dishes wait on the pass, ready for service. In the foreground, a marble countertop is visible, strewn with fresh herbs and exotic spices. A stack of well-worn cookbooks sits nearby, hinting at the chef's dedication to her craft and continuous learning. The overall scene captures the intensity, precision, and passion of high-end culinary artistry.
Winner: Midjourney
Midjourney wins for the realism of the main character. It isn't perfect and I prefer the dynamism of the Flux image but the challenge is creating accurate humans and Midjourney is closer with better skin texture.
2. A street musician
The next prompt asks both AI image generators to show a street musician in his late 30s performing on a busy city corner lost in the moment of the music.
Part of the prompt requires the inclusion of an appreciative passerby, coins in a guitar case and city life blurring in motion behind the main character.
A street musician in his late 30s is frozen in a moment of passionate performance on a busy city corner. His long, dark dreadlocks are caught mid-sway, some falling over his face while others dance in the air around him. His eyes are closed in deep concentration, brows slightly furrowed, as his weathered hands move deftly over the strings of an old, well-loved acoustic guitar. The musician is wearing a vibrant, hand-knitted sweater that's a patchwork of blues, greens, and purples. It hangs loosely over distressed jeans with artistic patches on the knees. On his feet are scuffed brown leather boots, tapping in rhythm with his music. Multiple colorful braided bracelets adorn his wrists, adding to his bohemian appearance. He stands on a gritty sidewalk, with a battered guitar case open at his feet. It's scattered with coins and bills from appreciative passersby, along with a few fallen autumn leaves. Behind him, city life unfolds in a blur of motion: pedestrians hurry past, yellow taxis honk in the congested street, and neon signs begin to flicker to life as dusk settles over the urban landscape. In the foreground, slightly out of focus, a child tugs on her mother's hand, trying to stop and listen to the music. The scene captures the raw energy and emotion of street performance against the backdrop of a bustling, indifferent city.
Winner: Midjourney
Midjourney wins again for the realism of the character. The texture quality of v6.1 once again puts it just ahead. It is also overall a better image in terms of structure, layout and background.
3. The gardener
Generating images of older people can always be a struggle for AI image generators because of the more complex skin texture. Here we want a woman in her 80s caring for plants in a rooftop garden.
The image depicts elements of the scene including climbing vines and a golden evening light with the city skyline looming large behind our gardener.
An elderly woman in her early 80s is tenderly caring for plants in her rooftop garden, set against a backdrop of a crowded city. Her silver hair is tied back in a loose bun, with wispy strands escaping to frame her kind, deeply wrinkled face. Her blue eyes twinkle with contentment as she smiles at a ripe tomato cradled gently in her soil-stained gardening gloves. She's wearing a floral print dress in soft pastels, protected by a well-worn, earth-toned apron. Comfortable slip-on shoes and a wide-brimmed straw hat complete her gardening outfit. A pair of reading glasses hangs from a beaded chain around her neck, ready for when she needs to consult her gardening journal. The rooftop around her is transformed into a green oasis. Raised beds burst with a variety of vegetables and flowers, creating a colorful patchwork. Trellises covered in climbing vines stand tall, and terracotta pots filled with herbs line the edges. A small greenhouse is visible in one corner, its glass panels reflecting the golden evening light. In the background, the city skyline looms large - a forest of concrete and glass that stands in stark contrast to this vibrant garden. The setting sun casts a warm glow over the scene, highlighting the lush plants and the serenity on the woman's face as she finds peace in her urban Eden.
Winner: Midjourney
Once again Midjourney wins because of the texture quality. It struggled a little with the gloved fingers but it was better than Flux. That doesn't mean Flux isn't a good image but it isn't as good as Midjourney.
4. Paramedic in an emergency
For this prompt I went with something more action heavy, focusing on a paramedic in the moment of rushing to the ambulance on a rainy day. This included a description of water droplets clinging to eyelashes and reflective strips.
This was a more challenging prompt for AI image generators as it has to capture the darker environment. 'Golden hour' light is easier for AI than night and twilight.
A young paramedic in her mid-20s is captured in a moment of urgent action as she rushes out of an ambulance on a rainy night. Her short blonde hair is plastered to her forehead by the rain, and droplets cling to her eyelashes. Her blue eyes are sharp and focused, reflecting the flashing lights of the emergency vehicles. Her expression is one of determination and controlled urgency. She's wearing a dark blue uniform with reflective strips that catch the light, the jacket partially unzipped to reveal a light blue shirt underneath. A stethoscope hangs around her neck, bouncing slightly as she moves. Heavy-duty black boots splash through puddles, and a waterproof watch is visible on her wrist, its face illuminated for easy reading in the darkness. In her arms, she carries a large red medical bag, gripping it tightly as she navigates the wet pavement. Behind her, the ambulance looms large, its red and blue lights casting an eerie glow over the rain-slicked street. Her partner can be seen in the background, wheeling out a gurney from the back of the vehicle. In the foreground, blurred by the rain and motion, concerned onlookers gather under umbrellas near what appears to be a car accident scene just out of frame. The wet street reflects the emergency lights, creating a dramatic kaleidoscope of color against the dark night. The entire scene pulses with tension and the critical nature of the unfolding emergency.
Winner: Draw
I don't think either AI image generator won this round. Both have washed out and over 'plastic' face textures likely caused by the lighting issues. Midjourney does a slightly better job matching the description of the scene.
5. The retired astronaut
Finally we have a scene in a school. Here I've asked the AI models to generate a retired astronaut in his late 60s giving a presentation about space.
He is well presented in good health depicting a NASA logo. The background is well described with posters, quotes and people watching as he speaks.
A retired astronaut in his late 60s is giving an animated presentation at a science museum. His silver hair is neatly trimmed, and despite his age, he stands tall and straight, a testament to years of rigorous physical training. His blue eyes sparkle with enthusiasm as he gestures towards a large scale model of the solar system suspended from the ceiling. He's dressed in a navy blue blazer with a small, subtle NASA pin on the lapel. Underneath, he wears a light blue button-up shirt and khaki slacks. On his left wrist is a watch that looks suspiciously like the ones worn on space missions. His hands, though showing signs of age, move with the precision and control of someone used to operating in zero gravity. Around him, a diverse group of students listen with rapt attention. Some furiously scribble notes, while others have their hands half-raised, eager to ask questions. The audience is a mix of ages and backgrounds, all united by their fascination with space exploration. The walls of the presentation space are adorned with large, high-resolution photographs of galaxies, nebulae, and planets. Inspirational quotes about exploration and discovery are interspersed between the images. In one corner, a genuine space suit stands in a glass case, adding authenticity to the presenter's words. Sunlight streams through large windows, illuminating particles of dust floating in the air, reminiscent of stars in the night sky. The entire scene is bathed in a sense of wonder and possibility, as the retired astronaut bridges the gap between Earth and the cosmos for his eager audience.
Winner: Flux
I am giving this one to Flux. It won because it had skin texture and human realism on par or slightly better than Midjourney but with a much better overall image structure including more realistic background people.
Flux vs Midjourney: Which model wins
This was almost a clean sweep for Midjourney and it was mainly driven by the improvements Midjourney has made in skin texture rendering with v6.1.
I don't think it was as clear as it looks on paper though as in many images Flux had a better overall image structure and was better at backgrounds. I've also found Flux is more consistent with text rendering than Midjourney — but this test was about people and creating realistic digital humans.
What it does show is that even at the bleeding edge of AI image generation there are still tells in every image that sell it as AI generated.