- WeeklyDispatch.AI
- Posts
- The week in AI: Apple *finally* flexes some AI muscle
The week in AI: Apple *finally* flexes some AI muscle
Plus: How ChatGPT thinks, explained
Welcome to The Dispatch! We are the newsletter that keeps you informed about AI. Each Thursday, we aggregate the major developments in artificial intelligence; we pass along the news, useful resources, tools and services, and highlight the top research in the field as well as exciting developments in open source. Even if you aren’t a machine learning engineer, we’ll keep you in touch with the most important developments in AI.
NEWS & OPINION
-------------------------
While many tech giants have been racing to push AI into every product and press release since the bombshell release of ChatGPT 3.5 in November 2022, Apple has consistently taken a much more measured and stealthy approach - famously avoiding the term entirely at their Worldwide Developers Conference in 2023.
During the first half of the 2-hour WWDC 2024 keynote this week, it seemed like déjà vu - no mention of AI at all. And then, sans hype or fake demos, they revealed their vision for AI through Apple Intelligence, a partnership with OpenAI, and a host of new AI integrations and features coming to iOS 18, iPadOS, and macOS 15. Here’s everything you need to know:
Privacy: Apple Intelligence aims to set a new standard for privacy in AI through on-device processing integrated into iPhones, iPads, and Macs. This means your device can be aware of personal information without collecting it and storing it elsewhere. With the newly announced Private Cloud Compute, Apple Intelligence can draw on larger server-based models (running on Apple silicon) to handle more complex requests while still protecting your privacy.
Upgrades to Siri: Siri will gain onscreen awareness, allowing it to view and interact with what's displayed on your screen, and pull relevant data from your notes, emails, texts, and more to personalize responses. Additionally, Siri's upgraded conversational abilities and integration with advanced language models means it will be able to remember context across requests and handle more complex tasks with fewer "Sorry, I didn't catch that" responses. You’ll be able to ask Siri to tailor and send emails, create notes, edit photos, and more - with future updates planned to allow app developers to integrate even more actions.
New AI features: Apps like Mail, Messages, and Notes will have built in writing tools that allow users to quickly auto-generate and edit text. Mail will utilize AI to better organize inboxes; Notes and Phone will both gain new audio transcription (and summarization) capabilities. A new "Image Playground" feature introduces an AI image generator, and AI-crafted ‘Genmojis’ enable personalized text-to-image emojis. Photos will get more conversational search abilities and new editing and ‘storytelling’ tools.
OpenAI integration: The partnership with OpenAI will allow Siri to leverage ChatGPT/GPT-4o when needed for more complex questions. Additionally, OpenAI’s blog post also outlined additional tools like image generation and document understanding embedded into the new iOS.
Apple’s approach to partnerships with other tech giants is definitely unique. In addition to the OpenAI partnership, they confirmed future plans to partner with Google/Gemini - and per The Information are considering a partnership with OpenAI rival Anthropic. Apple is making these partnerships while developing and enhancing their own language models.
Upon hearing news of Apple’s partnership with OpenAI, Elon Musk threatened to ban the use of all Apple devices from his companies, claiming that the integration of ChatGPT into operating systems poses an “unacceptable security violation”. He then said his company’s visitors will need to check Apple devices at the door and store them in a “Faraday cage”.
Apple shares closed at a record high during WWDC, and they’re currently battling closely with Microsoft for the title of most valuable company in the world - again.
-------------------------
In October of 2022, the United States began what has been an extreme series of export controls on advanced chips (and the manufacturing equipment required to make them) imposed on the People’s Republic of China. The use of advanced chips, particularly in emergent AI capabilities for intelligence and military applications, is seen as a major national security issue.
This time around, the US is targeting ‘gate all around’ technology - an advanced new transistor architecture that greatly enhances cutting-edge chip performance. The most advanced chipmakers are already using this technology for 3-nanometer (and soon, 2-nanometer) chips, while China’s most advanced fabs are only just now manufacturing 5-nanometer chips.
Despite falling behind in the nanometer-node process, the effectiveness of these export controls is highly debatable. China has shown a remarkable dedication and resilience towards the development and manufacturing of advanced chips without reliance on other countries - something the US can’t make a claim to (although Intel is trying - the most advanced AI chips are not currently manufactured on US soil). Further, it has recently been revealed that Chinese companies are exploiting loopholes in the export controls by accessing GPUs in the US.
And speaking of the effectiveness of these export controls around China’s AI development…
-------------------------
In February, OpenAI shocked the world by showcasing the capability of their text-to-video model, Sora. At the time, it was seen as a huge advance over nascent text-to-video models like RunwayML and Google’s Lumiere.
Given what we’ve covered above, it might then come as a surprise that only a few months later, Chinese tech firm Kuaishou has introduced KLING, a new text-to-video AI model capable of generating high-quality videos with outputs that appear to rival and occasionally surpass the still-unreleased Sora.
KLING can produce videos at 1080p resolution with a maximum length of two minutes, surpassing the one minute-max Sora videos demoed by OpenAI. The KLING demos include realistic outputs of animals, people eating food, scenic shots, as well as surreal clips like animals in clothes or driving vehicles.
From our research, this appears to be a legitimate text-to-video project/model; Kuaishou was China's first short video platform and still has a massive worldwide user base (hundreds of millions of users globally). They have proprietary access to a huge repository of short videos for training a model like this. It’s worth noting that despite the claim of two minute video generation, most of the clips are only a few seconds long.
The model is currently available to Chinese-based users as a public demo on the KWAI iOS app. In related news, Luma Labs just released their Dream Machine text-to-video generator, which is available for use now in the US.
MORE IN AI THIS WEEK
Elon Musk drops lawsuit after OpenAI published his emails
This is what it looks like when AI eats the world
The smart, cheap fix for slow, dumb traffic lights
Fake beauty queens charm judges at the Miss AI pageant
Buzzy AI search engine Perplexity is directly ripping off content from news outlets
How to build a DOA product: Humane AI Pin founders banned internal criticism
A social app for creatives, Cara grew from 40k to 650k users in a week because artists are fed up with Meta’s AI policies
Paris-based AI startup Mistral AI raises $640M
Brazil hires OpenAI to cut costs of court battles
Bloomberg profiles Sam Altman: bending the world to his will long before OpenAI
TRENDING AI TOOLS, APPS & SERVICES
Dream Machine from Luma Labs: new text-to-video generator
PDF Translate: 100+ languages supported - translate PDF documents while preserving layout
Udio: AI music generator - upload a sound and let AI generate a song
Driver AI: explains millions of lines of code in minutes instead of months
Jenni: supercharge your next research paper - AI-powered text editor helps you write, edit, and cite with confidence
Moltar: turn hours of homework into minutes
Biread: transform any website content into bilingual text with a single click
ChainGPT: expert AI assistant on crypto and blockchain
Khroma: uses AI to learn which colors you like and creates limitless palettes for you to discover, search, and save
GUIDES, LISTS, PRODUCTS, UPDATES, INTERESTING
Meta launched a new AI assistant for businesses in WhatsApp
ARC Prize is a $1,000,000+ public competition to beat tech giants and open source an AGI solution
Microsoft pulls release preview build of Windows 11 24H2 after Recall feature controversy and sends Copilot Pro's GPT Builder to the digital dumpster
The 7 best Microsoft Copilot prompts
Your next job interview might be with an AI recruiter
Michael Kors first to debut Shopping Muse, the AI-powered shopping assistant from Mastercard
Google’s June updates for Pixel: Gemini comes to more Pixel phones
VIDEOS, SOCIAL MEDIA & PODCASTS
FTC Chair Lina Khan shares how the agency is looking at AI [YouTube]
Former OpenAI engineer Andrej Karpathy: Let's reproduce GPT-2 (124M) [YouTube]
WWDC 2024 Recap: Is Apple Intelligence Legit? [YouTube]
Hugging Face open source LLM Leaderboard: Alibaba’s open-source model Qwen moves into top spot ahead of Mistral and Llama 3 [X]
Canadian Prime Minister Justin Trudeau says we should not slow down AI development based on dystopian sci-fi scenarios [X]
OpenAI CTO says models in labs not much better than what the public has already [Reddit]
Exactly 6 years ago GPT-1 was released [Reddit]
How AI is eating finance - with Mike Conover of Brightwave [Podcast]
TECHNICAL, RESEARCH & OPEN SOURCE
-------------------------
LLMs are often labeled as ‘black boxes’ - it’s currently beyond our reach to understand exactly how these advanced language models create their outputs. There is an entire field dedicated to this endeavor called mechanistic interpretability, and some researchers are getting very creative in their attempts to look at LLMs in a more structured way, similar to how we analyze control systems in engineering. At the tail end of last week, OpenAI published new research to help peer more closely into the box, revealing how GPT-4 processes and represents information internally. Here are some of the highlights from the 34-page paper:
Decomposing GPT-4’s brain: The researchers implemented a sophisticated technique involving sparse autoencoders that can dissect the internal representations of GPT-4. These encoders identify and isolate patterns or "features" that the model uses to understand and generate language - and they found 16 million of them.
Interpretable features unveiled: The study also highlighted several different types of features GPT-4 encodes, ranging from mathematical concepts like algebraic rings to more abstract notions such as rhetorical questions and human imperfections (!).
Quantitative metrics for quality: OpenAI developed methods to not only detect but also evaluate the quality of the features extracted from GPT-4. They used three main metrics: Recovery of Hypothesized Features checks if the model can accurately identify expected concepts like “economic inflation”; Sparsity of Downstream Effects measures whether tweaking one feature minimally impacts others, ensuring each feature’s influence is clear and contained; and Explainability of Activation Patterns rates how easily someone can understand what each feature does, such as recognizing a feature that triggers for questions about geography. As the size of the autoencoder increases, these metrics improve - indicating that larger models are better at both capturing diverse concepts and making them understandable.
OpenAI’s research team has made their code, autoencoders, and a feature visualizer available to the public. This can help other researchers to explore and build upon their findings, and hopefully furthering our understanding of LLMs and their capabilities.
MORE IN T/R/OS:
(Anthropic research) On developing Claude’s personality/character
(Google DeepMind research) Open-endedness is essential for artificial superhuman intelligence
Hello Qwen2: new top open source LLM from China’s Alibaba
Agentic: AI agent stdlib that works with any LLM and TypeScript AI SDK
Thread: Jupyter Notebook that combines the experience of OpenAI's code interpreter with the familiar development environment of a Python notebook
Flash Diffusion from Jasper AI: accelerating any conditional diffusion model for few steps image generation
AI in software engineering at Google: Progress and the path ahead
That’s all for this week. We’ll see you next Thursday.