• WeeklyDispatch.AI
  • Posts
  • The week in AI: Sora is OpenAI's mind-blowing text to video generator

The week in AI: Sora is OpenAI's mind-blowing text to video generator

Plus: Google unveils new open source LLM

Sponsored by

Welcome to The Dispatch! We are the newsletter that keeps you informed about AI. Each Thursday, we aggregate the major developments in artificial intelligence; we pass along the news, useful resources, tools and services, and highlight the top research in the field as well as exciting developments in open source. Even if you aren’t an engineer, we’ll keep you in touch with what’s going on in AI.

NEWS & OPINION

-------------------------

OpenAI has once again shocked the AI world with a stunning showcase of their text-to-video model, Sora. Given a brief text prompt or even a still image, Sora (the Japanese word for “sky”) can generate 1080p movie-like scenes with multiple characters, different types of motion and background details in a range of styles, up to a minute long.

While even OpenAI’s hand-picked videos have irregularities, the overall quality and frame-to-frame consistency is far beyond previous state of the art models, such as Runway ML’s Gen-2, Meta’s Emu Video, and even Google’s Lumiere. Whereas competitor models feel like a stop motion of AI images, Sora’s videos are surprisingly - and perhaps disturbingly - realistic.

Sora’s model is based on a transformer architecture, and has drawn criticism from OpenAI’s competitors for not having a true understanding of the physical world. Given that, it’s perhaps not a coincidence that the same day Sora was announced, Meta released V-JEPA - it’s own video model architecture which uses latent-space abstractions (rather than pixel-based recognition like Sora) to detect and understand the physics behind interactions between objects in videos.

It’s possible most people will care less about how an AI model makes the videos than how good the videos are; and even though there’s still a long way to go (some of the videos are comically erroneous/bizarre), the AI buzz phrase this is the worst it will ever be applies. Sora is currently being red-teamed for safety and access has been granted to a number of visual artists, designers, and filmmakers to get more feedback. There’s no word yet of a public release date or beta sign-up.

-------------------------

Barely two months after releasing Gemini to the public (and just a week after announcing Gemini Ultra 1.0), Google is already introducing Gemini 1.5. The new iteration comes with a massive expansion of its context window and the adoption of a "Mixture of Experts" (MoE) architecture, promising to make the AI both faster and more efficient. The new model also includes expanded multimodal capabilities (when given a 44-minute silent Buster Keaton movie, the model can accurately analyze various plot points and events, and even reason about small details in the movie that could easily be missed).

The ability to process up to 1 million tokens dwarfs the capabilities of its competitors (ChatGPT’s context window is 128,000 tokens). Google CEO Sundar Pichai highlighted the transformative potential of this feature, stating: "This allows use cases where you can add a lot of personal context and information at the moment of the query ... I view it as one of the bigger breakthroughs we have done."

Gemini 1.5 also shows impressive “in-context learning” skills, meaning that it can learn a new skill from information given in a long prompt. This negates the need to fine-tune a model to make it perform better at specific tasks. Gemini 1.5 is being rolled out to select developers and enterprise users now, and you can apply for early access. No word on the official release date other than “soon”.

-------------------------

Lenny Rachitsky, author of popular weekly advice column Lenny’s Newsletter, has provided 20 use case/examples of how people are using custom GPTs to make their teams more productive at work.

Custom GPTs provide a unique way for non-developers to customize and leverage a cutting-edge AI model to fit their specific use cases. Maybe you could use some help with drafting some documentation for work, but you know ChatGPT doesn’t have enough context/data about what you do to create anything of value. With just a little bit of effort and experimentation, you can remedy that. Or maybe you want to converse with an “expert” on a particular subject matter or research paper - just upload the relevant documentation, provide some guidelines for the GPT’s behavior, and you will get a much better result for your inputs than with the default ChatGPT. Use cases are only limited to the imagination.

(Custom GPTs require a ChatGPT Plus subscription; machine learning platform Hugging Face offers a free, open source alternative to GPTs called Hugging Chat Assistants - not quite as effective, but free.)

MORE IN AI THIS WEEK

Your SOC 2 Compliance Checklist from Vanta

Are you building a business? Achieving SOC 2 compliance can help you win bigger deals, enter new markets and deepen trust with your customers — but it can also cost you real time and money.

Vanta automates up to 90% of the work for SOC 2 (along with other in-demand frameworks), getting you audit-ready in weeks instead of months. Save up to 400 hours and 85% of associated costs.

Download the free checklist to learn more about the SOC 2 compliance process and the road ahead. 

TRENDING AI TOOLS & SERVICES

  • Groq: serving the fastest AI responses you’ve ever seen

  • Adobe’s AI assistant: builds on Acrobat Liquid Mode to further unlock document intelligence with new capabilities in Reader and Acrobat

  • Made With Sora: curated gallery of prompts and videos generated by Sora

  • Global AI Regulation Tracker: website that tracks AI regulations around the world

  • Glif: remix any image on the web with a Chrome extension

  • MagiScan: creates realistic 3D models with your phone - downloadable in various file formats (like STL for 3D printing or USDZ for augmented reality applications)

  • KippyAI: your personal AI language tutor

  • KardsAI: turn any text into ready-to-use flashcards

GUIDES, LISTS, USEFUL INFO

VIDEOS, SOCIAL MEDIA & PODCASTS

  • Sora’s AI problems (and solutions) [YouTube]

  • Testing Gemini 1.5 and a 1 Million Token Window [YouTube]

  • How Sora works (a layman's explanation) [X]

  • Musk says Grok 1.5 releases in 2 weeks [X]

  • OpenAI/Sora is now on [TikTok]

  • Lex Fridman with Marc Raibert: Boston Dynamics and the future of robotics [Podcast]

  • (Discussion) Meta AI chief Yann LeCun doubles down on Sora opinion [Reddit]

  • (Discussion) Apple readies AI tool to rival Microsoft’s GitHub Copilot [Reddit]

TECHNICAL, RESEARCH & OPEN SOURCE

-------------------------

Google has just released Gemma, their first family of open models built from the same research and technology used to create Gemini. Gemma models are lightweight, coming in two sizes, Gemma 2B and Gemma 7B. They’re both available as pre-trained and instruction-tuned variants. Alongside these models, Google is releasing a Responsible Generative AI Toolkit and offering support across major frameworks like JAX, PyTorch, TensorFlow, and Hugging Face Transformers.

Gemma models share infrastructure components with Gemini models - Gemma 2B and 7B models are trained on 2T and 6T tokens of text, respectively. The data for training is primarily English from web documents, mathematics, and code. The models are designed for CPU and on-device applications (2B model) and GPU and TPU deployment (7B model). They’re compatible across laptops, desktops, IoT, mobile and cloud.

Gemma 7B appears to outperform both Llama-2 7B and Mistral 7B on several benchmarks, is available for commercial use, and has been optimized for Nvidia GPUs and (obviously) Vertex AI/Google Cloud.

-------------------------

Groq, an emerging AI chip company, is making headlines with its impressive Language Processing Units (LPUs) that are setting new benchmarks for speed in language model applications, notably outperforming Nvidia’s best-in-class GPUs. Through a series of viral demonstrations, Groq has showcased its capability to dramatically accelerate the performance of large language models, making even the most advanced/turbo versions of many popular AI chatbots appear slow in comparison. The company asserts that its technology offers the fastest processing for large language models, a claim that has been supported by third-party evaluations (scroll down to throughput).

Behind Groq's innovation are its LPUs, designed to overcome the computational density and memory bandwidth bottlenecks commonly faced by GPUs and CPUs in AI model training and execution. Founded by Jonathon Ross, who previously co-founded Google’s AI chip division, Groq is an inference engine that enhances the speed of existing AI chatbots rather than replacing them. The technology has shown potential to run ChatGPT over 13 times faster than current standards. Despite the buzz, the scalability of Groq's AI chips remains an open question.

MORE IN T/R/OS

That’s it for this week! We’ll see you next Thursday.