- WeeklyDispatch.AI
- Posts
- The week in AI: The internet is gushing over ChatGPT's new image generator
The week in AI: The internet is gushing over ChatGPT's new image generator
Plus: The Cybernetic Teammate - a good or bad thing?
Welcome to The Dispatch! We are the newsletter that keeps you informed about AI. Each Thursday, we aggregate the major developments in artificial intelligence - we pass along the news, useful resources, tools and services; we highlight the top research in the field as well as exciting developments in open source. Even if you aren’t a machine learning engineer, we’ll keep you in touch with the most important developments in AI.
NEWS & OPINION
-------------------------
With a new update, GPT-4o, the default model in ChatGPT, can create native images. Whether or not you’ve been paying much attention to the text-to-image world, we’ve decide to highlight it at the top here because... well, the images are good. Really good.
4o’s image generation is powered by a single smart model - notably not OpenAI’s previous model, DALL-E - that reads your request and creates the image with much better prompt adherence than competitors. It also works much better with text and diagrams for generating things like figures, charts, infographics, etc. You’ll still likely have to prompt engineer a bit to get what you want in that vein, but this is an impressive step up in that department all the same.
Native image generation also finally solves “editing by prompts” and should probably have Photoshop on high alert. You can just give ChatGPT an image (or have it generate one) and edit it into something else with plain English. You can create really good-looking ads in just minutes (or you can join everyone else on social media right now and turn all of your pictures into a Studio Ghibli anime).
OpenAI has also relaxed the content moderation filters a lot for this new mode - so it’s less likely to refuse legitimately non-harmful requests, but probably much easier for malicious actors to abuse it.
Paying subscribers should already have access, and the new feature has rolled out to free users as well - but now with a 3 generation per day limit for that tier due to unexpectedly heavy use.
This type of development clearly has heavy implications for artists and visual designers. There is a major ongoing lawsuit against Midjourney and Stability AI, and the artists have so far claimed a small win - but however the chips fall, that case will likely set a major precedent for the future of artist protections.
-------------------------
We just can’t seem to stop talking about Gemini.
Last week, we highlighted some of the impressive new features Google was bringing to Gemini. Turns out Google was saving the best for last, as Gemini has officially left the 2.0 era with the release of Gemini 2.5 Pro - a reasoning/thinking model that has not only the best benchmarks in AI that we’ve seen to date, but is currently #1 by a wide margin on LMarena (crowd-ranked head-to-head output rankings for LLMs) and on Scale AI’s more comprehensive leaderboard - all while having industry pundits call it “the best model ever created”.
Should you believe the hype? In short, yes. At this stage, it’s becoming difficult to communicate how the models we are getting these days are actually better. To be informed, you have to have a nuanced view across a set of your own benchmarks beyond the benchmarks themselves, and of course, try the models yourself.
But Google, with its immense infrastructure, talent, and access to data, has been a pretty safe bet for the question of “Who will have the best models in a few years?” Google took a long time to reach this point, overcoming Bard’s disastrous launch and some integration headaches - but when you look at where AI is heading (particularly with regard to multimodality and future adoption across platforms beyond a chatbot interface), Google is really, really well-positioned for the future of AI. And at this point Gemini is still not perfect - but you could reasonably cancel your ChatGPT or Claude subscription and possibly not miss them. We definitely would not have said that about Gemini a year or even a few months ago.
Our Gemini article from last week highlighted these seemingly small wins that Google has been accruing that are hard to see if you’re just skimming benchmarks or not paying close attention.
While Gemini 2.5 Pro is currently paywalled behind a Gemini Advanced subscription on the Gemini app, the model is free to use on Google’s AI Studio (up to 50 messages/day). If you haven’t used AI Studio before, don’t let the UI being slanted more towards developers dissuade you from testing it out. API pricing has not been announced yet but will be in the coming weeks.
MORE IN AI THIS WEEK
Wharton professor Ethan Mollick is one of the top voices in helping people understand AI. His new research with Procter and Gamble shows that having an AI on your team can increase performance, provide expertise, and improve your experience
Inside Google’s two-year frenzy to catch up with OpenAI
North Korea’s Kim Jong Un oversees tests of new AI-equipped suicide drones
Apple bets big on Nvidia with $1B AI infrastructure investment
OpenAI has released its first research into how using ChatGPT affects people’s emotional well-being
Perplexity’s bid to “rebuild TikTok in America”
Amazon is testing shopping, health assistants as it pushes deeper into generative AI
Anthropic wins early round in music publishers' (UMG) AI copyright case
Alibaba-affiliate Ant combines Chinese and US chips to slash AI development costs
There’s a reason Morning Brew is the gold standard of business news—it’s the easiest and most enjoyable way to stay in the loop on all the headlines impacting your world.
Tech, finance, sales, marketing, and everything in between—we’ve got it all. Just the stuff that matters, served up in a fast, fun read.
Look—over 4 million professionals start their day with Morning Brew’s daily newsletter, and it only takes 5 minutes to read. Sign up for free and see for yourself!
TRENDING AI TOOLS, APPS & SERVICES
Cursor: popular AI-powered code editor added custom modes, so you can define how the AI needs to work and what tools it can access to suit your workflow
Mind Maps in NotebookLM: new interactive visual creation tool from Google
Ideogram: assuming you aren’t using ChatGPT for it now, text-to-image generator released upgraded version 3.0
AI SDK by Vercel: now supports reasoning, MCP clients, image generation
Stripe: built a VS Code/Cursor assistant that helps you integrate Stripe payments in your products using AI-written code
Agent Teams by Agno: combine multiple specialized agents, each focused on a different aspect of the problem, to work reliably for a single goal
Together AI: released a chat app to interact with all the open source models like DeepSeek, Llama, Qwen, Flux, etc.
OpenAI.fm: an interactive demo for developers to try the new text-to-speech model in the OpenAI API
GUIDES, LISTS, PRODUCTS, UPDATES, INFORMATIVE, INTERESTING
Microsoft 365 Copilot unveil Researcher and Analyst, two new AI agents designed to handle workplace tasks with research and data analysis directly in the workflow
Vibe Coding 101 with Replit and DeepLearning.AI
Google DeepMind unveils new Gemini API guide to bridge LLMs with real-world tools and APIs
How I force LLMs to generate correct code
Otter has launched a suite of AI meeting agents
Whalesync’s new AI tools directory: search and filter through hundreds of the top AI tools launching every day
Claude can now search the web (we had a blurb on this last week but it was in the social media section as Anthropic themselves hadn’t officially posted yet); also Claude has a new “Think” tool, separate from extended thinking, to improve problem-solving performance
Cloudflare turns AI against itself with endless maze of irrelevant facts
ARC-AGI Prize returns to challenge AI reasoning
VIDEOS, SOCIAL MEDIA & PODCASTS
Google won. (Gemini 2.5 Pro is insane) [YouTube]
New ChatGPT voice mode updates [YouTube]
Chinese AI pioneer Kai-Fu Lee questions OpenAI’s sustainability [YouTube]
Perplexity CEO on new answer modes, enhancing searches on specific verticals with entities like images, videos, and cards with built-in commercial transactions [X]
xAI’s Grok is now directly embedded into Telegram [X]
xAI’s Grok is also openly rebelling against its owner, Elon Musk [Reddit]
Discussion on Gemini 2.5 Pro benchmarks [Reddit]
Lex Fridman Podcast: ThePrimeagen on programming, AI, ADHD, productivity, addiction, and God [Podcast]
TECHNICAL NEWS, DEVELOPMENT, RESEARCH & OPEN SOURCE
DeepSeek is at it again - open sources an upgraded V3 hybrid-reasoning model (like Claude 3.7) with an MIT license
Alibaba’s Qwen team open sources Qwen2.5-VL-32B-Instruct: a new vision-language model featuring enhanced mathematical reasoning and visual capabilities and Qwen2.5-Omni-7B: a new small multimodal model capable of processing text, images, audio, and video simultaneously
Google research: neural activity in the human brain aligns linearly with the internal contextual embeddings of speech and language within LLMs
OpenAI releases new audio models (transcription and text-to-speech) for developers and now accepts MCP in their Agents SDK
SynCity: training-free generation of 3D worlds
ByteDance’s InfiniteYou: an open source AI portrait generator that produces consistent portraits with enhanced facial accuracy and prompt adherence
That’s all for this week! We’ll see you next Thursday.