- WeeklyDispatch.AI
- Posts
- The week in AI: TikTok developer ByteDance's 'OmniHuman' erases the line between real and AI-generated
The week in AI: TikTok developer ByteDance's 'OmniHuman' erases the line between real and AI-generated
Plus: OpenAI releases o3-mini & "deep research" agent
Welcome to The Dispatch! We are the newsletter that keeps you informed about AI. Each Thursday, we aggregate the major developments in artificial intelligence - we pass along the news, useful resources, tools and services; we highlight the top research in the field as well as exciting developments in open source. Even if you aren’t a machine learning engineer, we’ll keep you in touch with the most important developments in AI.
NEWS & OPINION
-------------------------
Early last week, Sam Altman praised the success of DeepSeek R1 while teasing that OpenAI would pull some of their releases forward in response. Within six days, the company had introduced both a new reasoning model available to everyone and a PhD-level research assistant.
Let’s start with o3-mini, their newest cost-efficient reasoning model, available on ChatGPT (for all tiers) and via API. o3-mini is a powerful and fast reasoning model that is particularly strong in science, math, and coding. While o1 remains the more powerful ‘general knowledge’ reasoning model, o3-mini provides an accesible alternative focused on STEM and coding capabilities.
For ChatGPT users:
Free users can try o3-mini by selecting 'Reason' in the message composer. For paid users, o3-mini has replaced o1-mini in the model picker.
Pro users have unlimited access to the model, while Plus and Team users gain 3x rate limits (150 messages per day with o3-mini and 50 messages per week on o3-mini-high). o3-mini-high is particularly good for coding; we’ll talk about that more in the next section, but one of our featured tools this week is a fun o3-mini-high creation that only took an hour to build.
o3-mini works with search to find up-to-date answers with links to relevant web sources, but can’t understand images - bit of a headscratcher here, as o1 can’t search the web, but can understand images.
For developers/capability assessment:
The two o3-mini models are arguably SOTA for coding, with some caveats. Claude 3.5 Sonnet is still a tried and true co-pilot, and DeepSeek R1’s coding prowess has made waves even in comparison to o1, but o3-mini-high can perform tasks in one or few-shot prompts the other models simply can’t. However, a) OpenAI’s models continue to have a major weakness around ‘existing code’ awareness. They have a higher likelihood of assuming your working code must need some retroactive bulldozing than Claude or DeepSeek, which makes them difficult to work with occasionally; and b) the models require more context and high-level prompting to get the most out of them. Lazy prompting can make for surprisingly poor results with o3-mini.
The o3-mini API is very robust out of the gate. It becomes the first reasoning model with features like function calling, structured outputs, and developer messages, making it production-ready. The model offers three reasoning effort levels (low, medium, high) to allow for tailored performance, optimizing for either complex tasks or faster responses.
Developers in API usage tiers 3-5 can access o3-mini via the Chat Completions, Assistants, and Batch APIs.
OpenAI also launched Deep Research, a new “agent” in ChatGPT that’s based on the full o3 model yet to be publicly released. Similar to Gemini’s feature with the same name, it scans through multiple websites, reasons about them and adds an intelligent POV to compile an output similar to a full-blown research report. Google’s implementation mostly just summarizes whatever it happens to find; this tool is much better than that.
To be fair, Gemini deep research is $20/month with unlimited reports - ChatGPT deep research is only for Pro users ($200/month) and you get 100 queries a month - but again, the gap is pretty huge. Reception to the tool has been extremely positive overall. Dan Shipper from Every took it for a run and called it a “bazooka for the curious mind”. Ethan Mollick created a 30-page report on table top games (takes a bit to load) using deep research and shared his views on his popular blog, One Useful Thing. American economist Tyler Cowen calls the level of accuracy and clarity stunning.
From our analysis, deep research can still hallucinate and is most prone to do so if you ask it about topics that are not likely to be covered by o3’s pre-training (aka very recent or very obscure). The model’s overreliance on internet retrieval in these instances means you are going to get a result closer to Google’s summarizations, likely with errors or at least very important context/concepts missing.
Unsurprisingly perhaps, the open source community is not too far behind. GPT Researcher has been steadily improving for months, and HuggingFace is currently hiring for their Open-source DeepResearch project which already looks very impressive. Between R1 and this nascent gap-narrowing between open and closed models, it’s not too hard to see why Sam Altman has just recently made the admission that his company is likely on the wrong side of history with regard to open source.
Well, yeah.
-------------------------
It’s hard to get away from the turbulence of the political news in the US, particularly surrounding the growing unease of Elon Musk’s DOGE ambitions.
Now, a leaked audio recording has revealed controversial plans for artificial intelligence deployment across federal agencies, led by Thomas Shedd - a former Tesla engineer who is now heading the GSA’s Technology Transformation Services. According to the recording obtained by 404 Media, Shedd outlined plans to create "AI coding agents" that would write government software and proposed significant changes to Login.gov that would enable broader integration with sensitive systems like Social Security, despite employee concerns about legal restrictions.
During the meeting, Shedd acknowledged the concerns but maintained a stance of moving forward regardless, stating "Things are going to get intense." When confronted about privacy law concerns, Shedd suggested they would "try to get consent" but emphasized they should "still push forward and see what we can do." The initiatives seem like part of the larger effort to reduce the federal workforce, while maintaining government programs through increased automation.
Let’s be clear: current AI systems are in no way prepared to automate at this level. Nowhere in Shedd’s comments was there a single comment about safety planning. In government technology implementations, red-teaming exercises and security protocols aren't optional features - they're fundamental requirements for responsible deployment.
The proposed changes have faced significant internal resistance, with one anonymous employee describing the reaction as "pretty unanimously negative." Of particular concern is the security risk posed by AI-generated code in federal systems, with the employee warning that "Government software is concerned with things like foreign adversaries attempting to insert backdoors into government code. With code generated by AI, it seems possible that security vulnerabilities could be introduced unintentionally."
DOGE is already facing major lawsuits, and today the US Treasury Department has agreed not to give Elon Musk’s agency access to its payment systems after public outrage, pending federal judgement. We might end with a popcorn emoji here if the stakes weren’t so high. Musk is an unelected private citizen and DOGE is not a legally established agency.
-------------------------
Beyond the mainstream AI news this week, ByteDance researchers just showcased OmniHuman-1, a new AI system that can create remarkably realistic videos from just a single reference image and audio input, with the ability to handle image content types ranging from standard portraits to challenging poses and even cartoons. The system can create convincing videos of any length and style, with adjustable body proportions and aspect ratios. Everything down to the lip-syncing with the provided audio file is very nearly spot on here, with virtually none of the typical AI-generated gaffes from other video generation tools.
While ByteDance has not made the technology publicly available, emphasizing that they "do not offer services or downloads anywhere," the demonstrated capabilities signal a pivotal moment in the development of AI-generated media. Deepfakes are obviously not a new problem, but this type of tool is simply next level with regard to ease of creation and realism. If you’re an AI expert, you will probably be able to tell something is just a little bit off. The general public? Likely not.
MORE IN AI THIS WEEK
Google drops pledge not to use AI for weapons or surveillance
Meta publishes its Frontier AI Framework, remains committed to open-source development while focusing on mitigating cybersecurity and weapon risks
OpenAI said to be in talks to raise $40B (with SoftBank leading round at $15-25B) at a $340B valuation
Chinese state-linked accounts hyped DeepSeek AI launch ahead of US stock rout, Graphika says
The Beatles’ AI-assisted track “Now and Then” won the Grammy for Best Rock Performance
Ex-Google engineer charged with espionage to boost AI in China
Receive Honest News Today
Join over 4 million Americans who start their day with 1440 – your daily digest for unbiased, fact-centric news. From politics to sports, we cover it all by analyzing over 100 sources. Our concise, 5-minute read lands in your inbox each morning at no cost. Experience news without the noise; let 1440 help you make up your own mind. Sign up now and invite your friends and family to be part of the informed.
TRENDING AI TOOLS, APPS & SERVICES
Replit iOS App: turn your ideas into apps with Replit’s new iPhone app
Gemini in Google Sheets: functionality upgrade - generate charts and valuable insights right in your spreadsheets
n8n: world's most popular workflow automation/agent platform for technical teams
Icon: the AI Admaker - create winning ads in minutes
Image to ASCII Art: convert your images into ASCII art
Geospy AI: find the location from any photo through pixel analysis
ZylerAI: your AI agent for Google Analytics
CookTok: turn TikToks into easy-to-follow recipes
Riffusion (beta): create the music you imagine
GUIDES, LISTS, PRODUCTS, UPDATES, INFORMATIVE, INTERESTING
How I use AI - early 2025 edition
Google Gemini 2.0 is now available to everyone (new models updated Feb 5, including 2.0 Pro, are free to use on Google’s AI studio ahead of the official Gemini 2.0 Pro release)
Amazon's AI revamp of Alexa assistant nears unveiling
Can you bypass Anthropic’s new Constitutional Classifiers with a universal jailbreak? Up to $30k bounty
ChatGPT on WhatsApp now also has image uploads and voice message support by texting 1-800-CHATGPT (1-800-242-8478)
Including any expletives in your search query will stop Google’s AI Overviews from appearing at the top of the results page
Google has launched "Daily Listen," an AI audio feature in the Google app that creates personalized 5-minute podcast episodes based on your Google Discover feed interests
Apple introduces Apple Invite - the AI-powered … party planner?
VIDEOS, SOCIAL MEDIA & PODCASTS
Anthropic CEO Dario Amodei on AI competition - ChinaTalk [Podcast]
Renowned AI educator Andrej Karpathy takes a deep dive into LLMs [YouTube]
DeepSeek R1 GAVE ITSELF a 2x speed boost - self-evolving LLM? [YouTube]
Andrew Ng discusses DeepSeek-R1's impact on AI, highlighting cost-effective open-weight models [X]
US AI czar David Sacks shares SemiAnalysis report: DeepSeek spent over $1B on computing, $6M training cost number ‘highly misleading’ [X]
A simple hack to chat with any GitHub repo [X]
Kanye West officially confirms the use of AI on his upcoming album ‘BULLY’ [X]
Discussion on OpenAI’s deep research capability [Reddit]
AMA with OpenAI’s Sam Altman & team [Reddit]
OpenAI uses the r/ChangeMyView subreddit to test AI persuasion [Article]
TECHNICAL NEWS, DEVELOPMENT, RESEARCH & OPEN SOURCE
Google DeepMind publishes SFT Memorizes, RL Generalizes: explaining why reinforcement learning outperforms supervised fine-tuning for generalization tasks
Mistral releases Mistral Small 3: a latency-optimized 24B-parameter (okay, it’s not that small!) model released under the Apache 2.0 license
Google Cloud Research’s Learn-by-Interact enables LLM agents to self-adapt and improve performance using synthetic interaction data
ASAP: Nvidia and Carnegie Mellon teach Unitree G1 robots how to move like professional athletes
DeepSeek’s R1 and OpenAI’s Deep Research just redefined AI - RAG, distillation, and custom models will never be the same
MIT research unveils ChromoGen: an AI model that predicts 3D genome structures in minutes (rather than days) and enables advanced DNA analysis
That’s all for this week! We’ll see you next Thursday.