Stability AI, a fading name in the increasingly crowded generative AI space, released Stable Diffusion 3 Medium (SD3M) this week, calling it “our most sophisticated image generation model to date.” However, real-world users are finding more terror than sophistication, with the text-to-image model consistently producing Lovecraftian monsters.
“Stable Diffusion 3 Medium represents a major milestone in the evolution of generative AI, continuing our commitment to democratizing this powerful technology,” Stability AI writes, setting expectations relatively high.
Stability AI continues, promising a model “Delivers images with exceptional detail, color, and lighting, enabling photorealistic outputs as well as high-quality outputs in flexible styles,” with improved performance concerning “common pitfalls of other models, such as realism in hands and faces…”
However, a highly-upvoted post from the r/StableDiffusion subreddit asks, “Is this release supposed to be a joke?” suggesting that Stability AI has thoroughly missed the mark in its latest release.
“Right now it’s really bad,” writes one user, Coyotewld.
“I haven’t been able to generate a single decent image at all outside of the example prompts. I’ve tried highly descriptive prompts with no luck. Even an absolutely basic one like ‘photograph of a person napping in a living room’ leads to Cronenberg-esque monstrosities. That’s using the example ComfyUI workflows provided,” adds user quill18.
“Look on the bright side, at least hand rendering has improved considerably,” user –Dave-AI– jokes.
Quips aside, AI models are typically pretty terrible with hands, although many models have drastically improved in this area.
The model isn’t looking great on X, formerly Twitter, either.
I’m wondering why don’t you post some real images by released SD3? Not by the model that wasn’t lobotomised and is kept inside. pic.twitter.com/tCmPACCT6M
— hoblin 🇺🇦🇫🇮 (@hoblon) June 12, 2024
However, although some results are awful, users on both Reddit and X have noted that SD3M performs pretty well with text, which has long been a challenge for text-to-image models.
Its amazing. pic.twitter.com/G2FCUurQfc
— Ramesh Dontha 🦉 (@EntrepreneursAI) June 12, 2024
Nearly all AI models, as improved as they are, fall short at times for various reasons, but users report that the issues go far beyond a few cherry-picked examples. One user claims that all their typical prompts are worse in the latest version, while another adds that they get “one decent image out of 20 generations by going through random seeds with the same prompt and hoping for the best.”
That is not the kind of performance users have come to expect from Stable Diffusion, especially not from a model the developer says is its best yet.
Wtf @StabilityAI
It may be poor for this and it’s still not perfect. pic.twitter.com/shgTZWa1bx— Andres Bravo (@Andres2003Bravo) June 12, 2024
As Ars Technica reports, some Stable Diffusion users are blaming Stability AI’s censorship for the model’s poor performance.
While Stable Diffusion’s open nature gives some users hope that the latest model will be significantly improved by community-generated fine-tuning, its openness has also led to issues, including companies building illegal and unethical porn generators on Stable Diffusion models. This controversy, among others, has brought significant scrutiny to AI image generators, and Stable Diffusion users believe this is at least partly to blame for SD3M’s woeful handling of human anatomy.
Plus, the company has also lost vital members in recent months, which undoubtedly doesn’t help. Around the same time as the exodus, the company was also accused of an attempted data theft by Midjourney.
Add the newest model’s propensity for crafting monstrous abominations in response to basic prompts, and Stability AI is having a tough 2024 thus far.
While the new AI base model is demonstrably a mixed bag, some Stable Diffusion users hold out hope that the community will deliver fine-tuning in short order. Sometimes, AI models must crawl before they can walk and may even take steps backward during development. On the plus side, it might be easier to walk with extra legs.
Image credits: This article includes AI-generated images created inside Stable Diffusion 3 Medium. Individual creators are credited in the captions.