- WeeklyDispatch.AI
- Posts
- The week in AI: World modeling - how the Godmother of AI and Google are creating interactive environments from a single image
The week in AI: World modeling - how the Godmother of AI and Google are creating interactive environments from a single image
Plus: OpenAI's Twelve Days of 'Shipmas'
Welcome to The Dispatch! We are the newsletter that keeps you informed about AI. Each Thursday, we aggregate the major developments in artificial intelligence - we pass along the news, useful resources, tools and services; we highlight the top research in the field as well as exciting developments in open source. Even if you aren’t a machine learning engineer, we’ll keep you in touch with the most important developments in AI.
(It’s buried in the social media section, but what has everyone buzzing this week is Sam Altman’s announcement of 12 Days of ‘Shipmas‘, which started today with a $200/mo version of ChatGPT and the full o1 release)
NEWS & OPINION
-------------------------
Text-to-image, text-to-video, image-to-video… What’s next in generative AI - image-to-interactive environment?
Well, yes. Lightning struck twice this week as both Fei Fei Li’s heavily hyped startup World Labs and Google unveiled new foundation ‘world models’. The releases look like a good first step toward AI systems that are capable of creating immersive, interactive environments on the fly - and perhaps even a step closer to AGI, per Google:
“While this research is still in its early stage with substantial room for improvement on both agent and environment generation capabilities, we believe Genie 2 is the path to solving a structural problem of training embodied agents safely while achieving the breadth and generality required to progress towards AGI.”
World Labs’ browser-based system transforms any image into an explorable 3D environment - complete with real-time camera effects, interactive lighting, and animation sliders. The model also offers control over visual elements like depth-of-field and a dolly zoom. DeepMind’s Genie 2 takes a more technically robust and ‘video game’ oriented direction, combining world generation with real-time physics, character controls, and spatial memory.
We’re still in the very earliest development stages, but these models signal a future where creating and interacting with complex 3D spaces might be as simple and scalable as generating static images is today. You can join the waitlist for World Labs now; Genie 2 will probably be locked behind DeepMind’s closed doors for some time (no word on testing/public release dates yet).
-------------------------
AWS’ re:Invent 2024 conference started this week in Las Vegas, and every single topic on the ticket intersects with artificial intelligence in some way. Amazon has been a laggard, to say the least, in the AI space - but it looks like they are finally starting to show some teeth. re:Invent 2024 goes through Friday, but there have been quite a few juicy AI reveals already:
Amazon is building what is expected to be the world’s largest AI supercomputer with Anthropic. For a more technical lens on this, SemiAnalysis has an impressive breakdown of Amazon’s quest for AI sovereignty from a hardware perspective.
A suite of new multimodal foundation models targeting text, image, and video generation called Nova was announced. Three of the six Nova models are available now, with an image generator, video generator, and an advanced reasoning model soon to come.
According to benchmarking from Artificial Analysis, the Nova models are competitive with (but not better than) current state-of-the-art LLMs. But with everyone focused on model capabilities, not many are seeing the insanely low costs (Nova Pro is priced at $0.80 per million input tokens - significantly cheaper than Claude 3.5 Sonnet’s $3.00 and GPT-4o’s $2.50 despite similar performance). Nova models are also the fastest models in their respective intelligence classes in Amazon Bedrock.
The new Automated Reasoning checks feature in Bedrock can supposedly eliminate ‘hallucinations’. Color us skeptical. The checks attempt to figure out how a model arrived at an answer - and discern whether the answer is correct. Customers upload info to establish a ground truth of sorts, and the Automated Reasoning checks create rules that can then be refined and applied to a model. It’s too soon to say how effective this feature is, as the company has volunteered no data on the checks thus far.
The next generation of SageMaker was unveiled, integrating analytics and ML tools into a unified platform. The upgrades, including Lakehouse and Unified Studio capabilities, allow enterprises to seamlessly link data from various sources for faster AI app development.
A number of new tools were unveiled to simplify retrieval augmented generation (RAG) workflows for both structured and unstructured data, including Amazon Bedrock Knowledge Bases and GraphRAG. These features can automate complex tasks like generating SQL queries and creating knowledge graphs, enabling enterprises to build more accurate, intelligent AI applications without custom coding/expertise.
Amazon got what feels like a later start into the AI race, but these announcements collectively are the company’s biggest enterprise play so far.
-------------------------
The Biden administration has introduced further export restrictions aimed at limiting China's access to advanced chip technology, a move made as part of the broader effort to curb Beijing's ambitions in artificial intelligence. The measures include a ban on the export of chipmaking equipment and high-bandwidth memory chips critical for AI data centers, as well as the blacklisting of 140 more entities linked to China's chip sector. Commerce Secretary Gina Raimondo described the rules as "groundbreaking and sweeping".
However, delays in implementation and concessions to foreign partners and industry stakeholders have weakened the immediate impact of the controls - and analysts noted that China had time to stockpile enough advanced chips to potentially get them to self-sufficiency in semiconductors. Jordan Schneider’s ChinaTalk has an extensive and mostly pejorative analysis of the controls.
China responded by announcing retaliatory restrictions on key rare earth materials like gallium, germanium, and graphite. The US imports almost all of the rare earths it uses and China is far and away the main supplier (as it is for the rest of the world).
The Biden administration's negotiations with allies like the Netherlands and Japan aimed to create a united front against China's access to advanced tools but led to further delays. With the administration's term ending, attention will shift to the incoming Trump administration to determine whether it will strengthen these measures or take a different approach.
MORE IN AI THIS WEEK
OpenAI weighs plan for ads in ChatGPT to bolster revenue
Defense tech firm Anduril announces a new strategic partnership with OpenAI to develop AI-powered aerial defense systems
Google DeepMind’s new AI model is the best yet at weather forecasting
Salesforce shares jump on earnings beat and strong AI deals pipeline
Databricks closes in on multibillion funding round at $55 billion valuation to help employees cash out
The race is on to make AI agents do your online shopping for you
Landlords are using AI to raise rent - and cities are pushing back
Over half of longer English-language posts on LinkedIn are AI-generated
Elon Musk targets OpenAI’s for-profit transition in a new filing
Create, Publish & Earn with Synthflow AI Voice Agents Marketplace
TRENDING AI TOOLS, APPS & SERVICES
TwinMind: connects to your calendar and task manager to analyze your workflow and suggest ways to stay on top of your day
NewsBang: organizes trending news articles, videos, and podcasts into a personalized, swipeable feed for quick browsing
Voice Control by Hume: new feature allows developers to create consistent, custom AI voices by adjusting 10 voice attribute sliders
ElevenLabs GenFM: generate personalized podcasts from PDFs, articles, eBooks, links or text in 32 languages
ElevenLabs Conversational AI: a new tool that allows users to add voice capabilities in 31 languages to AI agents
screenshot-to-code: drop in a screenshot and convert it to clean code (HTML/Tailwind/React/Vue)
DataFuel: turn websites into LLM-ready data and scrape knowledge bases in a single query
GUIDES, LISTS, PRODUCTS, UPDATES, INFORMATIVE
Exa.ai introduces Websets: a web-scale embeddings-based search engine that lets you create comprehensive sets of data (of anything) by simply typing what you want
Google’s VEO video generation model is now available in private preview on the company’s Vertex AI platform
Black Friday spending broke records this year as a fifth of consumers used AI chatbots to find deals
Learn how RAG improves LLM applications with real-world use cases
Learn how to use Apple Intelligence’s ‘Image Playground’
Exa opens the waitlist for an AI that can generate web-based datasets from any prompt
VIDEOS, SOCIAL MEDIA & PODCASTS
OpenAI’s “12 Days of OpenAI 🎄🎅” (live product demos/streams) starts today [X]
GenChess turns your ideas into playable art/chess pieces using Google’s Imagen 3 model [X]
ChatGPT refuses to say the name “David Mayer,” and no one knows why [X] (mystery solved)
Logan Kilpatrick (Google AI Studio/former OpenAI) claims you will be fu*ked soon if your life plan assumes intelligence has positive market value [Reddit]
New York Times: The next frontier -Sam Altman on the future of AI and society [YouTube]
Bloomberg: Why 2025 will be the year of AI agents [YouTube]
The Browser Company teases Dia, a new AI-integrated smart web browser with agentic actions and natural language prompting in the command bar [Video]
TECHNICAL NEWS, DEVELOPMENT, RESEARCH & OPEN SOURCE
Tencent releases HunyuanVideo: open-source, open-weights, 13B parameter video generation model that beats top closed models in testing
Google’s PaliGemma2: a family of versatile vision language models
Liquid AI’s new STAR model architecture outshines transformers (90% cache size reduction versus traditional ML transformers)
Nous Research pre-trains a 15B parameter language model decentralized over the internet (deja vu here - a new trend for pre-training?)
Hume + Anthropic ‘Computer Use’ demo on Replit - allows developers to create apps to control a computer with just voice
That’s all for this week! We’ll see you next Thursday.