Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More
We’re coming up on the one year anniversary since OpenAI released its first “omni” or multimodal model, GPT-4o back in May 2024, but that old standby still has some tricks up its sleeve.
Case-in-point, today OpenAI finally turned on the native multimodal image generation capabilities of GPT-4o for users of its hit chatbot ChatGPT on the Plus, Pro, Team, and Free usage tiers, though the company said it would also soon be made available for Enterprise, Edu, and through its application programming interface (API).
Unlike the previous generative AI image model available in ChatGPT — OpenAI’s DALL-E 3, a classic diffusion transformer model that was trained to reconstruct images from text prompts by removing noise from pixels — this new image generator is part of the same model that spits out text and code, as OpenAI trained the entire model to understand all these forms of media at once.
OpenAI president Greg Brockman had long ago previewed this native capability of GPT-4o back in May 2024, but for reasons that still remain unknown publicly, the company held onto it until now — following the public release of what many AI power users saw as a similar feature from Google AI Studio with its Gemini 2 Flash Experimental model.
This has resulted in a much higher quality image generator that produces far more lifelike images and accurate text baked in, and it’s already impressing users — one of whom calls the quality “insane.”

By the same token (pun intended), OpenAI still hasn’t said precisely what data GPT-4o’s image generation capabilities were trained on — and given the history of the company and other model providers, it likely includes many artworks scraped from the web, some of which are presumably copyrighted, which is likely to anger the artists behind them.
Bringing Image Generation to ChatGPT and Sora
OpenAI has long aimed to make image generation a core capability of its AI models. With GPT-4o, users can now generate images directly in ChatGPT, refining them through conversation and adjusting details on the fly.
The model also integrates into Sora, OpenAI’s video-generation platform, further expanding multimodal capabilities.
In an announcement on X, OpenAI confirmed that GPT-4o’s image generation is designed to:
- Accurately render text within images, allowing for the creation of signs, menus, invitations, and infographics.
- Follow complex prompts with precision, maintaining high fidelity even in detailed compositions.
- Build upon previous images and text, ensuring visual consistency across multiple interactions.
- Support various artistic styles, from photorealism to stylized illustrations.
Users can describe an image in ChatGPT, specifying details such as aspect ratio, color schemes (hex codes), or transparency, and GPT-4o will generate it within a minute.
As independent AI consultant Allie K. Miller wrote on X, it’s a “Huge leap in text generation,” and is “the best” AI image generation model she’s seen.

Key capabilities and use cases
GPT-4o is designed to make image generation not just visually stunning but also practical. Some of the key applications include:
- Design & Branding – Generate logos, posters, and advertisements with precise text placement.
- Education & Visualization – Create scientific diagrams, infographics, and historical imagery for learning.
- Game Development – Maintain character consistency across different design iterations.
- Marketing & Content Creation – Produce social media assets, event invitations, and digital illustrations tailored to brand needs.
How GPT-4o improves generative images over DALL-E
According to OpenAI’s official thread on X, GPT-4o introduces several improvements over previous models:
- Better text integration: Unlike past AI models that struggled with legible, well-placed text, GPT-4o can now accurately embed words within images.
- Enhanced contextual understanding: GPT-4o leverages chat history, allowing users to refine images interactively and maintain coherence across multiple generations.
- Improved multi-object binding: While previous models had difficulty correctly positioning many distinct objects in a scene, GPT-4o can now handle up to 10-20 objects at once.
- Versatile style adaptation: The model can generate or transform images into a variety of styles, from hand-drawn sketches to high-resolution photorealism.
Limitations
Despite its advancements, GPT-4o still has some known challenges:
- Cropping Issues: Large images, such as posters, may sometimes be cropped too tightly.
- Text Accuracy in Non-Latin Scripts: Some non-English characters may not render correctly.
- Detail Retention in Small Text: Highly detailed or small-font text may lose clarity.
- Editing Precision: Modifying specific parts of an image may inadvertently affect other elements.
OpenAI is actively addressing these issues through ongoing model refinements.
Safety and labeling measures
As part of OpenAI’s commitment to responsible AI development, all GPT-4o-generated images include C2PA metadata, allowing users to verify their AI origin.
Moreover, OpenAI has built an internal search tool to help detect AI-generated images.
Strict safeguards are in place to block harmful content and prevent misuse, such as prohibiting explicit, deceptive, or harmful imagery.
OpenAI also ensures that images featuring real people are subject to heightened restrictions.
OpenAI CEO Sam Altman described the release as a “new high-water mark for creative freedom”, emphasizing that users will be able to create a wide range of visuals, with OpenAI observing and refining its approach based on real-world usage.
As AI-generated images become more precise and accessible, GPT-4o represents a significant step forward in making text-to-image generation a mainstream tool for communication, creativity, and productivity.
Daily insights on business use cases with VB Daily
If you want to impress your boss, VB Daily has you covered. We give you the inside scoop on what companies are doing with generative AI, from regulatory shifts to practical deployments, so you can share insights for maximum ROI.
Read our Privacy Policy
Thanks for subscribing. Check out more VB newsletters here.
An error occured.
