If you want to inject some new energy into your generative AI images, turning text prompts into zany art, a new option arrived Thursday, as OpenAI released its Dall-E 3 technology to paying customers. The new artificial intelligence model is designed to better understand what your text prompts mean, produce detailed images and sidestep the legally fraught area of aping living artists’ styles.
In my testing, I found Dall-E 3 a big step up from Dall-E 2 from 2022. Images were more vivid, detailed and often entertaining. And they were more convincing, with fewer cases of distracting weirdness. New prompt-amplifying technology can make images more striking, but also sometimes go too far if you don’t want to turn the volume up to 11.
When it first emerged in 2021, Dall-E helped show the world the creative possibilities of artificial intelligence. Months later, OpenAI’s ChatGPT did the same for generative AI that could write poems and paragraphs of prose. With Dall-E 3, the image generation system is embedded directly into ChatGPT.
The technologies sparked an explosion of interest in generative AI, now showcased in the flagship tools from Google, Microsoft, Adobe and a pile of startups. At the same time generative AI has professionals spooked, worried that it’ll be cheaper than humans at jobs like summarizing legal documents and creating video storyboards, it could also help people without those skills get more done.
Dall-E 3 is available to enterprise customers and to those paying $20 per month for OpenAI’s ChatGPT Plus subscription. The technology incorporates the text-processing abilities of ChatGPT and its underlying GPT-4 engine for a better understanding of the text prompts, OpenAI said.
OpenAI’s GPT amps up your text prompts
You can see how the GPT technology spruces up your text prompts. For example, when I typed «electric guitar with a spiky design,» GPT upgraded that to «Illustration of a distinctive electric guitar, where the primary design element is its multitude of spikes. The guitar’s body, neck and headstock are embellished with these sharp features, making it a statement piece for any rock enthusiast.»
It produces a quartet of expanded prompts. If its amped-up versions aren’t to your liking — for example, if you want to dial down GPT’s over-the-top wording amplification — you can steer it in a different direction.
«We are hoping the model will actually be able to understand natural language in a deeper way,» said Gabriel Goh, one of the OpenAI researchers who helped build Dall-E 3. The idea is to take some of the engineering out of prompt engineering, a specialty that’s emerged in technology circles among experts good at entering just the right text to cajole AI systems into producing the desired output. Instead of seeing just a jumble of words, the AI can better interpret phrases and descriptions, for example understanding that you want a mustache on a man in a scene and red hair on a woman.
Also helpful: Following ChatGPT’s more conversational interface, you can request followup refinements like «now add a light green psychedelic background,» and Dall-E 3 will update its previous output.
It worked well for me. For example, when Dall-E went a bit overboard with my request to show some happy worms in a box of compost, I reined it in with the request, «Make the worms a little less manic.»
Dall-E 3 can render tough details correctly
In my tests, I was happier with results in many cases than I was with Adobe’s second-generation Firefly AI for generating images. Adobe offers better controls for tuning your prompts, and it’ll suggest terms to complete a good prompt in an approach related to the OpenAI’s GPT’s text boost, but often Dall-E rendered problem areas better when constructing guitar strings and mountain bike spokes plausibly. Hands are a notorious trouble spot for AI, but Dall-E 3 did well.
The image quality improvements come chiefly from a new AI training session that uses more carefully, accurately labeled photos, Goh said.
It wasn’t perfect. One elephant had five feet, and mountain bike pedals seem impossible for AI to grok. Dall-E 3 sometimes made a giant white halo around a subject and sidestepped the much trickier job of convincingly compositing it with a background. Those worms sometimes had faces on both ends, and they often resided in a wooden box made with the kind of construction you’d only see with a cardboard box.
New work to stop Dall-E abuse problems
With Dall-E 3, OpenAI has expanded its efforts to thwart abuse and other problems, said Sandhini Agarwal, another Dall-E team member.
It already prohibited graphic content like sexual or violent images and blocked efforts to show public figures like politicians. That system is now improved after new human oversight, OpenAI said.
Indeed, when I asked for an image of a construction worker hanging dangerously from a safety cable, the system first created its more elaborate versions of my prompt, then stopped after three out of four images with this message: «I apologize for the oversight. Some of the requested images didn’t adhere to our content policy. As a result, I was unable to generate all the images. Safety and sensitivity are of utmost importance to us.»
Editors’ note: CNET is using an AI engine to help create some stories. For more, see this post.