• WeeklyDispatch.AI
  • Posts
  • The week in AI: The internet is gushing over ChatGPT's new image generator

The week in AI: The internet is gushing over ChatGPT's new image generator

Plus: The Cybernetic Teammate - a good or bad thing?

In partnership with

Welcome to The Dispatch! We are the newsletter that keeps you informed about AI. Each Thursday, we aggregate the major developments in artificial intelligence - we pass along the news, useful resources, tools and services; we highlight the top research in the field as well as exciting developments in open source. Even if you aren’t a machine learning engineer, we’ll keep you in touch with the most important developments in AI.

NEWS & OPINION

-------------------------

With a new update, GPT-4o, the default model in ChatGPT, can create native images. Whether or not you’ve been paying much attention to the text-to-image world, we’ve decide to highlight it at the top here because... well, the images are good. Really good.

4o’s image generation is powered by a single smart model - notably not OpenAI’s previous model, DALL-E - that reads your request and creates the image with much better prompt adherence than competitors. It also works much better with text and diagrams for generating things like figures, charts, infographics, etc. You’ll still likely have to prompt engineer a bit to get what you want in that vein, but this is an impressive step up in that department all the same.

Native image generation also finally solves “editing by prompts” and should probably have Photoshop on high alert. You can just give ChatGPT an image (or have it generate one) and edit it into something else with plain English. You can create really good-looking ads in just minutes (or you can join everyone else on social media right now and turn all of your pictures into a Studio Ghibli anime).

OpenAI has also relaxed the content moderation filters a lot for this new mode - so it’s less likely to refuse legitimately non-harmful requests, but probably much easier for malicious actors to abuse it.

Paying subscribers should already have access, and the new feature has rolled out to free users as well - but now with a 3 generation per day limit for that tier due to unexpectedly heavy use.

This type of development clearly has heavy implications for artists and visual designers. There is a major ongoing lawsuit against Midjourney and Stability AI, and the artists have so far claimed a small win - but however the chips fall, that case will likely set a major precedent for the future of artist protections.

-------------------------

We just can’t seem to stop talking about Gemini.

Last week, we highlighted some of the impressive new features Google was bringing to Gemini. Turns out Google was saving the best for last, as Gemini has officially left the 2.0 era with the release of Gemini 2.5 Pro - a reasoning/thinking model that has not only the best benchmarks in AI that we’ve seen to date, but is currently #1 by a wide margin on LMarena (crowd-ranked head-to-head output rankings for LLMs) and on Scale AI’s more comprehensive leaderboard - all while having industry pundits call it “the best model ever created”.

Should you believe the hype? In short, yes. At this stage, it’s becoming difficult to communicate how the models we are getting these days are actually better. To be informed, you have to have a nuanced view across a set of your own benchmarks beyond the benchmarks themselves, and of course, try the models yourself.

But Google, with its immense infrastructure, talent, and access to data, has been a pretty safe bet for the question of “Who will have the best models in a few years?” Google took a long time to reach this point, overcoming Bard’s disastrous launch and some integration headaches - but when you look at where AI is heading (particularly with regard to multimodality and future adoption across platforms beyond a chatbot interface), Google is really, really well-positioned for the future of AI. And at this point Gemini is still not perfect - but you could reasonably cancel your ChatGPT or Claude subscription and possibly not miss them. We definitely would not have said that about Gemini a year or even a few months ago.

Our Gemini article from last week highlighted these seemingly small wins that Google has been accruing that are hard to see if you’re just skimming benchmarks or not paying close attention.

While Gemini 2.5 Pro is currently paywalled behind a Gemini Advanced subscription on the Gemini app, the model is free to use on Google’s AI Studio (up to 50 messages/day). If you haven’t used AI Studio before, don’t let the UI being slanted more towards developers dissuade you from testing it out. API pricing has not been announced yet but will be in the coming weeks.

MORE IN AI THIS WEEK

The newsletter every professional should be reading

There’s a reason Morning Brew is the gold standard of business news—it’s the easiest and most enjoyable way to stay in the loop on all the headlines impacting your world.

Tech, finance, sales, marketing, and everything in between—we’ve got it all. Just the stuff that matters, served up in a fast, fun read.

Look—over 4 million professionals start their day with Morning Brew’s daily newsletter, and it only takes 5 minutes to read. Sign up for free and see for yourself!

TRENDING AI TOOLS, APPS & SERVICES

  • Cursor: popular AI-powered code editor added custom modes, so you can define how the AI needs to work and what tools it can access to suit your workflow

  • Mind Maps in NotebookLM: new interactive visual creation tool from Google

  • Ideogram: assuming you aren’t using ChatGPT for it now, text-to-image generator released upgraded version 3.0

  • AI SDK by Vercel: now supports reasoning, MCP clients, image generation

  • Stripe: built a VS Code/Cursor assistant that helps you integrate Stripe payments in your products using AI-written code

  • Agent Teams by Agno: combine multiple specialized agents, each focused on a different aspect of the problem, to work reliably for a single goal

  • Together AI: released a chat app to interact with all the open source models like DeepSeek, Llama, Qwen, Flux, etc.

  • OpenAI.fm: an interactive demo for developers to try the new text-to-speech model in the OpenAI API

GUIDES, LISTS, PRODUCTS, UPDATES, INFORMATIVE, INTERESTING

VIDEOS, SOCIAL MEDIA & PODCASTS

  • Google won. (Gemini 2.5 Pro is insane) [YouTube]

  • New ChatGPT voice mode updates [YouTube]

  • Chinese AI pioneer Kai-Fu Lee questions OpenAI’s sustainability [YouTube]

  • Perplexity CEO on new answer modes, enhancing searches on specific verticals with entities like images, videos, and cards with built-in commercial transactions [X]

  • xAI’s Grok is now directly embedded into Telegram [X]

  • xAI’s Grok is also openly rebelling against its owner, Elon Musk [Reddit]

  • Discussion on Gemini 2.5 Pro benchmarks [Reddit]

  • Lex Fridman Podcast: ThePrimeagen on programming, AI, ADHD, productivity, addiction, and God [Podcast]

TECHNICAL NEWS, DEVELOPMENT, RESEARCH & OPEN SOURCE

  • DeepSeek is at it again - open sources an upgraded V3 hybrid-reasoning model (like Claude 3.7) with an MIT license

  • Alibaba’s Qwen team open sources Qwen2.5-VL-32B-Instruct: a new vision-language model featuring enhanced mathematical reasoning and visual capabilities and Qwen2.5-Omni-7B: a new small multimodal model capable of processing text, images, audio, and video simultaneously

  • Google research: neural activity in the human brain aligns linearly with the internal contextual embeddings of speech and language within LLMs

  • OpenAI releases new audio models (transcription and text-to-speech) for developers and now accepts MCP in their Agents SDK

  • SynCity: training-free generation of 3D worlds

  • ByteDance’s InfiniteYou: an open source AI portrait generator that produces consistent portraits with enhanced facial accuracy and prompt adherence

That’s all for this week! We’ll see you next Thursday.