• WeeklyDispatch.AI
  • Posts
  • The week in AI: Google attempts to leapfrog both OpenAI and Apple with new Gemini features

The week in AI: Google attempts to leapfrog both OpenAI and Apple with new Gemini features

Plus: Elon Musk's xAI releases Grok 2 - a state of the art LLM with uncensored image creation

Welcome to The Dispatch! We are the newsletter that keeps you informed about AI. Each Thursday, we aggregate the major developments in artificial intelligence - we pass along the news, useful resources, tools and services; we highlight the top research in the field as well as exciting developments in open source. Even if you aren’t a machine learning engineer, we’ll keep you in touch with the most important developments in AI.

NEWS & OPINION

-------------------------

Google has introduced the company's latest advancements in mobile devices and AI integration at their Made by Google and Google Pixel events this week.

The new Pixel 9 lineup, consisting of the Pixel 9, Pixel 9 Pro, and Pixel 9 Pro Fold, represents a significant upgrade for Google's smartphone offerings. On a non-AI note, the Pixel 9 Pro Fold, with its 8-inch Super Actua Flex display, indicates Google's continued investment in reviving flip phones.

The focus of these events was unsurprisingly centered on the AI-powered software features that underpin these devices. While everyone has been waiting for ChatGPT’s advanced voice mode, Google seems to have beaten OpenAI to the punch with Gemini Live. Here are some of the key AI and software features introduced at the events:

  • Call Notes: An AI system that generates summaries of phone conversations. This Gemini version will run in Google’s cloud, and is a stark contrast to Apple’s commitment to on-device and private cloud AI. The payoff for Google is that this enables more complex functionalities than what we saw from Apple Intelligence.

  • Gemini Live: Google's advanced conversational AI assistant. It offers interaction across multiple apps and includes 10 new voice options for customization. From TechCrunch: “Gemini Live responded to questions in less than two seconds, and was able to pivot fairly quickly when interrupted. Gemini Live is not perfect, but it’s the best way to use your phone hands-free that I’ve seen yet.”

  • Enhanced Android integration: The new Pixel devices feature deeper AI integration within the Android OS, including context-aware assistance and direct interaction with on-screen content.

  • Project Astra: A developing multimodal AI system designed to combine voice, vision, and text understanding for more comprehensive mobile assistance. Gemini Live is just the first step of Google’s vision for Project Astra.

During the events, live demonstrations highlighted both the potential and current limitations of these AI systems. There were occasional errors in responses and imperfect handling of interruptions, underscoring that these systems are still in a phase of ongoing development and refinement.

Google's vision for mobile computing puts AI central to the user experience. They've been systematically embedding AI features into their suite of products and services, attempting to create a holistic, AI-driven user experience. It’s a sound business strategy because few competitors can match Google's comprehensive ecosystem of services and data sources. From search and email to maps, cloud storage - and now mobile hardware - Google's extensive reach allows for AI integration at multiple touchpoints in a user's digital life.

However, as we’ve noted, it also brings forth important considerations regarding data privacy, processing requirements, and the balance between on-device and cloud-based AI operations. It would be difficult for many of us to get comfortable with personal phone calls going to a Google server to be summarized.

-------------------------

A federal judge has allowed key parts of a class-action lawsuit against AI image generator companies (including Midjourney and Stability AI) to proceed, marking a significant step in the legal challenges facing the AI industry. The lawsuit alleges copyright infringement in the training of AI models using artists' work without consent. While some claims were previously dismissed, U.S. District Judge William Orrick permitted the core copyright infringement claims to move forward, enabling the case to enter the discovery phase.

The 33-page ruling represents one of the most advanced legal challenges to the generative AI industry's practice of using freely available internet content to train their models. The case, along with similar lawsuits involving writers and music labels, could potentially reshape the AI industry's approach to data sourcing and copyright. Legal experts noted that while this development is significant, the plaintiffs still face the challenge of proving meaningful copying of their work in AI training datasets and outputs. The outcome of this case could have far-reaching implications for the business model of generative AI companies and their data scraping practices.

-------------------------

xAI, the AI company founded by Elon Musk that recently raised $6 billion, has launched the second version of its large language model, Grok. It’s on par with most frontier AI models, even beating GPT-4o and Claude 3.5 Sonnet in some tasks. xAI also released a smaller Grok-2-mini version. Grok 2 is available only to X subscribers. If you’re counting, there are now five GPT-4 class models: GPT-4o, Claude 3.5, Gemini 1.5, Llama 3.1, and now Grok 2.

Considering how good the model is, it’s worth noting that Grok 3, which should be released before the end of the year, will be trained in the world’s largest GPU cluster - the 100,000 H100 GPU install base we covered two weeks ago.

Grok 2 also generates images. xAI partnered with Flux.01, which is among the best image generators in the world right now, from Black Forest Labs - a company founded by the creators of the original Stable Diffusion. And of course - because this is Elon Musk/X we’re talking about - the image generator is not without controversy as it allows users to create largely uncensored images.

MORE IN AI THIS WEEK

TRENDING AI TOOLS, APPS & SERVICES

  • Cosine: the world's best AI software engineer

  • Google Vids: create a video with AI in Google Vids (Workspace Labs)

  • Napkin: turns text into visuals with a bit of generative AI

  • Elevenstudios by ElevenLabs: fully managed video and podcast dubbing

  • BannerGPT: reads and understands your blog posts and creates meaningful illustrations to complement your writing

  • Tusk (YC W24): AI-created pull requests for annoying tickets

  • Renovate AI: plan your home renovation with AI

  • Feeling Great: AI mental wellness companion app

  • Omnifact: privacy-first AI platform built for businesses

  • Decover: AI-powered legal research

GUIDES, LISTS, PRODUCTS, UPDATES, INFORMATIVE

VIDEOS, SOCIAL MEDIA & PODCASTS

  • Genie from Cosine shatters coding benchmark record [X]

  • Upwork integrated OpenAI, including ChatGPT Enterprise, across its operations [X]

  • Who’s winning the race to AI-powered drones, the US or China? by the Wall Street Journal [YouTube]

  • Gemini Live - Google beats OpenAI to true voice AI (launch breakdown) [YouTube]

  • Unreasonably effective AI with Google DeepMind CEO Demis Hassabis [Podcast]

  • (Discussion) Stanford's Erik Brynjolfsson says there was a period after Deep Blue beat Garry Kasparov at chess that humans + machines could still win but now humans add nothing to machine performance, and the same thing could happen with employment [Reddit]

TECHNICAL, DEVELOPMENT, RESEARCH & OPEN SOURCE

That’s all for this week! We’ll see you next Thursday.