- WeeklyDispatch.AI
- Posts
- The week in AI: Google attempts to leapfrog both OpenAI and Apple with new Gemini features
The week in AI: Google attempts to leapfrog both OpenAI and Apple with new Gemini features
Plus: Elon Musk's xAI releases Grok 2 - a state of the art LLM with uncensored image creation
Welcome to The Dispatch! We are the newsletter that keeps you informed about AI. Each Thursday, we aggregate the major developments in artificial intelligence - we pass along the news, useful resources, tools and services; we highlight the top research in the field as well as exciting developments in open source. Even if you aren’t a machine learning engineer, we’ll keep you in touch with the most important developments in AI.
NEWS & OPINION
-------------------------
Google has introduced the company's latest advancements in mobile devices and AI integration at their Made by Google and Google Pixel events this week.
The new Pixel 9 lineup, consisting of the Pixel 9, Pixel 9 Pro, and Pixel 9 Pro Fold, represents a significant upgrade for Google's smartphone offerings. On a non-AI note, the Pixel 9 Pro Fold, with its 8-inch Super Actua Flex display, indicates Google's continued investment in reviving flip phones.
The focus of these events was unsurprisingly centered on the AI-powered software features that underpin these devices. While everyone has been waiting for ChatGPT’s advanced voice mode, Google seems to have beaten OpenAI to the punch with Gemini Live. Here are some of the key AI and software features introduced at the events:
Call Notes: An AI system that generates summaries of phone conversations. This Gemini version will run in Google’s cloud, and is a stark contrast to Apple’s commitment to on-device and private cloud AI. The payoff for Google is that this enables more complex functionalities than what we saw from Apple Intelligence.
Gemini Live: Google's advanced conversational AI assistant. It offers interaction across multiple apps and includes 10 new voice options for customization. From TechCrunch: “Gemini Live responded to questions in less than two seconds, and was able to pivot fairly quickly when interrupted. Gemini Live is not perfect, but it’s the best way to use your phone hands-free that I’ve seen yet.”
Enhanced Android integration: The new Pixel devices feature deeper AI integration within the Android OS, including context-aware assistance and direct interaction with on-screen content.
Project Astra: A developing multimodal AI system designed to combine voice, vision, and text understanding for more comprehensive mobile assistance. Gemini Live is just the first step of Google’s vision for Project Astra.
During the events, live demonstrations highlighted both the potential and current limitations of these AI systems. There were occasional errors in responses and imperfect handling of interruptions, underscoring that these systems are still in a phase of ongoing development and refinement.
Google's vision for mobile computing puts AI central to the user experience. They've been systematically embedding AI features into their suite of products and services, attempting to create a holistic, AI-driven user experience. It’s a sound business strategy because few competitors can match Google's comprehensive ecosystem of services and data sources. From search and email to maps, cloud storage - and now mobile hardware - Google's extensive reach allows for AI integration at multiple touchpoints in a user's digital life.
However, as we’ve noted, it also brings forth important considerations regarding data privacy, processing requirements, and the balance between on-device and cloud-based AI operations. It would be difficult for many of us to get comfortable with personal phone calls going to a Google server to be summarized.
-------------------------
A federal judge has allowed key parts of a class-action lawsuit against AI image generator companies (including Midjourney and Stability AI) to proceed, marking a significant step in the legal challenges facing the AI industry. The lawsuit alleges copyright infringement in the training of AI models using artists' work without consent. While some claims were previously dismissed, U.S. District Judge William Orrick permitted the core copyright infringement claims to move forward, enabling the case to enter the discovery phase.
The 33-page ruling represents one of the most advanced legal challenges to the generative AI industry's practice of using freely available internet content to train their models. The case, along with similar lawsuits involving writers and music labels, could potentially reshape the AI industry's approach to data sourcing and copyright. Legal experts noted that while this development is significant, the plaintiffs still face the challenge of proving meaningful copying of their work in AI training datasets and outputs. The outcome of this case could have far-reaching implications for the business model of generative AI companies and their data scraping practices.
-------------------------
xAI, the AI company founded by Elon Musk that recently raised $6 billion, has launched the second version of its large language model, Grok. It’s on par with most frontier AI models, even beating GPT-4o and Claude 3.5 Sonnet in some tasks. xAI also released a smaller Grok-2-mini version. Grok 2 is available only to X subscribers. If you’re counting, there are now five GPT-4 class models: GPT-4o, Claude 3.5, Gemini 1.5, Llama 3.1, and now Grok 2.
Considering how good the model is, it’s worth noting that Grok 3, which should be released before the end of the year, will be trained in the world’s largest GPU cluster - the 100,000 H100 GPU install base we covered two weeks ago.
Grok 2 also generates images. xAI partnered with Flux.01, which is among the best image generators in the world right now, from Black Forest Labs - a company founded by the creators of the original Stable Diffusion. And of course - because this is Elon Musk/X we’re talking about - the image generator is not without controversy as it allows users to create largely uncensored images.
MORE IN AI THIS WEEK
China’s Huawei readies new AI chip to challenge Nvidia, surmounting U.S. sanctions
Replika CEO Eugenia Kuyda says it’s okay if we end up marrying AI chatbots
JPMorgan Chase is giving its employees an AI assistant powered by ChatGPT maker OpenAI
These AI-powered nonprofits are making health care more equitable and effective
ChatGPT unexpectedly began speaking in a user’s cloned voice during testing
Lisa Su formally welcomes Silo AI team to AMD after completing $665 million acquisition
MIT releases comprehensive database of AI risks
Has your paper been used to train an AI model? Almost certainly
FCC cracks down on AI-generated voice calls
The Howard Hughes Medical Institute announced a $500 million investment over the next 10 years to support AI-driven projects in the life sciences
TRENDING AI TOOLS, APPS & SERVICES
Cosine: the world's best AI software engineer
Google Vids: create a video with AI in Google Vids (Workspace Labs)
Napkin: turns text into visuals with a bit of generative AI
Elevenstudios by ElevenLabs: fully managed video and podcast dubbing
BannerGPT: reads and understands your blog posts and creates meaningful illustrations to complement your writing
Tusk (YC W24): AI-created pull requests for annoying tickets
Renovate AI: plan your home renovation with AI
Feeling Great: AI mental wellness companion app
Omnifact: privacy-first AI platform built for businesses
Decover: AI-powered legal research
GUIDES, LISTS, PRODUCTS, UPDATES, INFORMATIVE
OpenAI updates ChatGPT to new model that exhibits multi-step reasoning
How I won $2,750 using JavaScript, AI, and a can of WD-40
The Google Pixel 9’s AI camera features let you reshape reality
Google Meet adds new note-taking AI
As Alexa turns 10, Amazon looks to generative AI
Grok can make images using Flux - 5 examples of it in action
My 3 favorite AI chatbot apps for iOS - and what you can do with them
VIDEOS, SOCIAL MEDIA & PODCASTS
Genie from Cosine shatters coding benchmark record [X]
Upwork integrated OpenAI, including ChatGPT Enterprise, across its operations [X]
Who’s winning the race to AI-powered drones, the US or China? by the Wall Street Journal [YouTube]
Gemini Live - Google beats OpenAI to true voice AI (launch breakdown) [YouTube]
Unreasonably effective AI with Google DeepMind CEO Demis Hassabis [Podcast]
(Discussion) Stanford's Erik Brynjolfsson says there was a period after Deep Blue beat Garry Kasparov at chess that humans + machines could still win but now humans add nothing to machine performance, and the same thing could happen with employment [Reddit]
TECHNICAL, DEVELOPMENT, RESEARCH & OPEN SOURCE
MultiOn launches Agent Q: a new breakthrough in AI agents/web navigation
Introducing SWE-bench Verified: OpenAI redesigns coding benchmark
Welcome FalconMamba: The first strong attention-free 7B model
Anthropic’s new ‘prompt caching’ for Claude will save developers a fortune
SingularityNET’s supercomputing network aims for community-led AGI, with 1st node coming online within weeks
How to reverse engineer the Substack (or any!) web API: building an unauthorized Substack client with Claude
Tokyo-based Sakana reveals an autonomous AI scientist
That’s all for this week! We’ll see you next Thursday.