• WeeklyDispatch.AI
  • Posts
  • The week in AI: World modeling - how the Godmother of AI and Google are creating interactive environments from a single image

The week in AI: World modeling - how the Godmother of AI and Google are creating interactive environments from a single image

Plus: OpenAI's Twelve Days of 'Shipmas'

In partnership with

Welcome to The Dispatch! We are the newsletter that keeps you informed about AI. Each Thursday, we aggregate the major developments in artificial intelligence - we pass along the news, useful resources, tools and services; we highlight the top research in the field as well as exciting developments in open source. Even if you aren’t a machine learning engineer, we’ll keep you in touch with the most important developments in AI.

(It’s buried in the social media section, but what has everyone buzzing this week is Sam Altman’s announcement of 12 Days of ‘Shipmas, which started today with a $200/mo version of ChatGPT and the full o1 release)

NEWS & OPINION

-------------------------

Text-to-image, text-to-video, image-to-video… What’s next in generative AI - image-to-interactive environment?

Well, yes. Lightning struck twice this week as both Fei Fei Li’s heavily hyped startup World Labs and Google unveiled new foundation ‘world models’. The releases look like a good first step toward AI systems that are capable of creating immersive, interactive environments on the fly - and perhaps even a step closer to AGI, per Google:

“While this research is still in its early stage with substantial room for improvement on both agent and environment generation capabilities, we believe Genie 2 is the path to solving a structural problem of training embodied agents safely while achieving the breadth and generality required to progress towards AGI.”

World Labs’ browser-based system transforms any image into an explorable 3D environment - complete with real-time camera effects, interactive lighting, and animation sliders. The model also offers control over visual elements like depth-of-field and a dolly zoom. DeepMind’s Genie 2 takes a more technically robust and ‘video game’ oriented direction, combining world generation with real-time physics, character controls, and spatial memory.

We’re still in the very earliest development stages, but these models signal a future where creating and interacting with complex 3D spaces might be as simple and scalable as generating static images is today. You can join the waitlist for World Labs now; Genie 2 will probably be locked behind DeepMind’s closed doors for some time (no word on testing/public release dates yet).

-------------------------

AWS’ re:Invent 2024 conference started this week in Las Vegas, and every single topic on the ticket intersects with artificial intelligence in some way. Amazon has been a laggard, to say the least, in the AI space - but it looks like they are finally starting to show some teeth. re:Invent 2024 goes through Friday, but there have been quite a few juicy AI reveals already:

  • Amazon is building what is expected to be the world’s largest AI supercomputer with Anthropic. For a more technical lens on this, SemiAnalysis has an impressive breakdown of Amazon’s quest for AI sovereignty from a hardware perspective.

  • A suite of new multimodal foundation models targeting text, image, and video generation called Nova was announced. Three of the six Nova models are available now, with an image generator, video generator, and an advanced reasoning model soon to come.

  • According to benchmarking from Artificial Analysis, the Nova models are competitive with (but not better than) current state-of-the-art LLMs. But with everyone focused on model capabilities, not many are seeing the insanely low costs (Nova Pro is priced at $0.80 per million input tokens - significantly cheaper than Claude 3.5 Sonnet’s $3.00 and GPT-4o’s $2.50 despite similar performance). Nova models are also the fastest models in their respective intelligence classes in Amazon Bedrock.

  • The new Automated Reasoning checks feature in Bedrock can supposedly eliminate ‘hallucinations’. Color us skeptical. The checks attempt to figure out how a model arrived at an answer - and discern whether the answer is correct. Customers upload info to establish a ground truth of sorts, and the Automated Reasoning checks create rules that can then be refined and applied to a model. It’s too soon to say how effective this feature is, as the company has volunteered no data on the checks thus far.

  • The next generation of SageMaker was unveiled, integrating analytics and ML tools into a unified platform. The upgrades, including Lakehouse and Unified Studio capabilities, allow enterprises to seamlessly link data from various sources for faster AI app development.

  • A number of new tools were unveiled to simplify retrieval augmented generation (RAG) workflows for both structured and unstructured data, including Amazon Bedrock Knowledge Bases and GraphRAG. These features can automate complex tasks like generating SQL queries and creating knowledge graphs, enabling enterprises to build more accurate, intelligent AI applications without custom coding/expertise.

Amazon got what feels like a later start into the AI race, but these announcements collectively are the company’s biggest enterprise play so far.

-------------------------

The Biden administration has introduced further export restrictions aimed at limiting China's access to advanced chip technology, a move made as part of the broader effort to curb Beijing's ambitions in artificial intelligence. The measures include a ban on the export of chipmaking equipment and high-bandwidth memory chips critical for AI data centers, as well as the blacklisting of 140 more entities linked to China's chip sector. Commerce Secretary Gina Raimondo described the rules as "groundbreaking and sweeping".

However, delays in implementation and concessions to foreign partners and industry stakeholders have weakened the immediate impact of the controls - and analysts noted that China had time to stockpile enough advanced chips to potentially get them to self-sufficiency in semiconductors. Jordan Schneider’s ChinaTalk has an extensive and mostly pejorative analysis of the controls.

China responded by announcing retaliatory restrictions on key rare earth materials like gallium, germanium, and graphite. The US imports almost all of the rare earths it uses and China is far and away the main supplier (as it is for the rest of the world).

The Biden administration's negotiations with allies like the Netherlands and Japan aimed to create a united front against China's access to advanced tools but led to further delays. With the administration's term ending, attention will shift to the incoming Trump administration to determine whether it will strengthen these measures or take a different approach.

MORE IN AI THIS WEEK

Create, Publish & Earn with Synthflow AI Voice Agents Marketplace

  • Discover templates for routine/repetitive tasks like lead qualification and managing appointments.

  • Publish your own Voice AI solutions to help businesses thrive—and earn commissions.

  • Access custom actions that automate CRM updates, appointment scheduling, and more.

TRENDING AI TOOLS, APPS & SERVICES

  • TwinMind: connects to your calendar and task manager to analyze your workflow and suggest ways to stay on top of your day

  • NewsBang: organizes trending news articles, videos, and podcasts into a personalized, swipeable feed for quick browsing

  • Voice Control by Hume: new feature allows developers to create consistent, custom AI voices by adjusting 10 voice attribute sliders

  • ElevenLabs GenFM: generate personalized podcasts from PDFs, articles, eBooks, links or text in 32 languages

  • ElevenLabs Conversational AI: a new tool that allows users to add voice capabilities in 31 languages to AI agents

  • screenshot-to-code: drop in a screenshot and convert it to clean code (HTML/Tailwind/React/Vue)

  • DataFuel: turn websites into LLM-ready data and scrape knowledge bases in a single query

GUIDES, LISTS, PRODUCTS, UPDATES, INFORMATIVE

VIDEOS, SOCIAL MEDIA & PODCASTS

  • OpenAI’s “12 Days of OpenAI 🎄🎅” (live product demos/streams) starts today [X]

  • GenChess turns your ideas into playable art/chess pieces using Google’s Imagen 3 model [X]

  • ChatGPT refuses to say the name “David Mayer,” and no one knows why [X] (mystery solved)

  • Logan Kilpatrick (Google AI Studio/former OpenAI) claims you will be fu*ked soon if your life plan assumes intelligence has positive market value [Reddit]

  • New York Times: The next frontier -Sam Altman on the future of AI and society [YouTube]

  • Bloomberg: Why 2025 will be the year of AI agents [YouTube]

  • The Browser Company teases Dia, a new AI-integrated smart web browser with agentic actions and natural language prompting in the command bar [Video]

TECHNICAL NEWS, DEVELOPMENT, RESEARCH & OPEN SOURCE

  • Tencent releases HunyuanVideo: open-source, open-weights, 13B parameter video generation model that beats top closed models in testing

  • Google’s PaliGemma2: a family of versatile vision language models

  • Liquid AI’s new STAR model architecture outshines transformers (90% cache size reduction versus traditional ML transformers)

  • Nous Research pre-trains a 15B parameter language model decentralized over the internet (deja vu here - a new trend for pre-training?)

  • Hume + Anthropic ‘Computer Use’ demo on Replit - allows developers to create apps to control a computer with just voice

That’s all for this week! We’ll see you next Thursday.