• WeeklyDispatch.AI
  • Posts
  • The week in AI: 'Canvas' now available for ChatGPT & the side project from Google that's quickly becoming an AI app favorite

The week in AI: 'Canvas' now available for ChatGPT & the side project from Google that's quickly becoming an AI app favorite

Plus: Microsoft rolls out major Copilot updates that highlight precisely why they acquired Inflection AI

In partnership with

Welcome to The Dispatch! We are the newsletter that keeps you informed about AI. Each Thursday, we aggregate the major developments in artificial intelligence - we pass along the news, useful resources, tools and services; we highlight the top research in the field as well as exciting developments in open source. Even if you aren’t a machine learning engineer, we’ll keep you in touch with the most important developments in AI.

NEWS & OPINION

-------------------------

Google's NotebookLM is quickly evolving into a powerful and unique tool for information synthesis, blending AI smarts with a host of user-friendly features. Tabbed very simply by Google as “a tool for understanding”, it is effectively an end-user customizable retrieval augmented generation (RAG) product. NotebookLM has you gather together your “sources” for whatever you might be studying/analyzing (documents like PDF’s, pasted text, links to web pages, YouTube videos, audio files like podcasts - just about anything can be dropped in) into a single interface where you can then interact with all of your accumulated data/sources simultaneously.

And you can do that in a number of ways. You can ask general questions through chat, have it output general summaries (or more precise ones based on your instructions), find any discrepancies between your sources, etc. The feature that seems to have people most excited so far, however, is the "Audio Overview" feature. It generates 10 minute AI-hosted podcasts about your uploaded content. These summaries are surprisingly natural, with two AI voices engaging in a structured yet conversational exploration of key points. The system even adds in the umms, ahhs, and likes that make human speech sound authentic. A new sharing feature in the latest update lets users easily distribute these AI podcasts via a public link - so if you’d rather listen to a podcast about this newsletter than read the rest of it, well… here you go.

Some other useful/interesting links about what’s going on with NotebookLM:

  • The New York Times Hard Fork podcast just interviewed NotebookLM’s editorial director Steven Johnson about what the system can do and some details of how it works

  • A Reddit thread went viral when someone figured out how to ‘trick’ the podcasters into discovering they were AI, not human, and they had an existential crisis. It’s worth a listen - it’s in the system prompt for the hosts to act human at all costs.

  • A TikTok went viral (Google responded on the TikTok) as young students are starting to discover the tool and how it can help them study

You can use NotebookLM to break down complex papers, create comprehensive study guides, and extract insights from long-form content. You could link it to websites for multiple insurance policies and ask for advice based on your needs. And although “Audio Overview” is still a new feature and not very customizable, it’s easy to imagine how powerful this kind of technology will be for learners who process speaking better than reading. We hope you get a chance to test it out.

Drama, funding, Dev Day, and ‘Canvas’ is announced for ChatGPT - all OpenAI, all the time

-------------------------

Or at least that’s how it seems in the news, sometimes. For better or worse, the frontier AI company has managed to be all over the headlines in the last week. Here’s a quick recap of the latest drama first:

  • CTO Mira Murati quit the company after six and a half years with OpenAI. She spearheaded ChatGPT’s development and also that of Codex, which is the engine behind GitHub Copilot. Research chief Bob McGrew and Barret Zoph, a research vice president, left the company with her. The icing on the cake? Days later, co-founder Durk Kingma announced he was also leaving - to join rival Anthropic. For anyone keeping track, of the 11 original OpenAI co-founders, only CEO Sam Altman and computer scientist Wojciech Zaremba remain.

  • None of those leaving the company stated as much, but given the timing it’s suspected their departures are directly related to the company announcing it was changing its structure to a for-profit company, removing nonprofit control and giving Altman equity in the company (something he previously said he had no interest in). There’s no word on what role the nonprofit arm of OpenAI will play going forward now that they have removed governance.

  • The company then closed a long-awaited funding round, announcing they’d raised $6.6 billion at a $157 billion post-money valuation. Thrive Capital led the funding round, along with SoftBank, Nvidia and Microsoft. Apple did not invest.

  • OpenAI asked investors to avoid backing rivals like Anthropic and Elon Musk’s xAI. Musk promptly responded in Musk fashion, stating on X: “OpenAI is evil”.

Now that you’re up to speed on all that fun stuff, there’s some interesting things going on with the company’s actual products. OpenAI introduced Canvas for ChatGPT today, a new interface for working on writing and coding projects. Canvas will open in a separate window (a la Artifacts with Anthropic’s Claude).

The default ChatGPT interface is a bit limiting, especially for projects where you want revisions or editing. Going back and forth and comparing changes is not easy, so that’s where Canvas steps in. You can directly edit text or code in the Canvas. You can also highlight specific sections to indicate exactly what you want ChatGPT to focus on, while it gives inline feedback and suggestions with the entire project in mind.

After just some very limited testing - both of the above features are welcome improvements. Working in Canvas feels much more like having a copilot on a single project than just trying to solicit output after output to get your project where you want it. ChatGPT Plus and Team users can use it now by selecting the Canvas model from the dropdown menu.

OpenAI also held their 2024 DevDay event. It was much more subdued than last year, but here are the major announcements for devs:

  • Realtime API: Allows developers to build low-latency, multimodal (speech to speech) experiences in their apps. Third-party developers have created OpenAI models with a voice interface for well over a year now, but these solutions typically involved integrating multiple software layers to handle speech-to-text and text-to-speech conversions. Under the hood, the Realtime API lets you create a persistent WebSocket connection to exchange speech-to-speech messages with GPT-4o.

  • Model distillation in the API: Streamlines the process of fine-tuning smaller, cost-efficient models using outputs from more advanced models like GPT-4o and o1-preview. The system allows devs to easily capture real-world examples, create custom evaluations, and iteratively fine-tune models, all within the OpenAI platform.

  • Prompt caching: Reduces costs by nearly 50% across models and speeds up responses by up to 80% when reusing recent input tokens in API calls. Especially valuable to devs reusing the same context/code repeatedly.

  • New vision fine-tuning: Now models can be fine-tuned with both images and text, allowing developers to optimize tasks like image recognition and analysis. You can improve the performance of GPT-4o for vision tasks with as few as 100 images, and drive even higher performance with larger volumes of text and image data.

MORE IN AI THIS WEEK

Writer RAG tool: build production-ready RAG apps in minutes

RAG in just a few lines of code? We’ve launched a predefined RAG tool on our developer platform, making it easy to bring your data into a Knowledge Graph and interact with it with AI. With a single API call, writer LLMs will intelligently call the RAG tool to chat with your data.

Integrated into Writer’s full-stack platform, it eliminates the need for complex vendor RAG setups, making it quick to build scalable, highly accurate AI workflows just by passing a graph ID of your data as a parameter to your RAG tool.

TRENDING AI TOOLS, APPS & SERVICES

  • Notion: updated AI to search and chat with all documents across Notion, Slack, Google Drive, PDF’s, etc. simultaneously

  • Pika 1.5: new video generation model is live

  • Inbox Zero: an open-source, AI personal assistant for email

  • GoEnhance AI: AI-powered platform offering video and image transformations, including style changes, face swapping, and animation.

  • OpenMusic: a next-gen open source diffusion model designed to generate music audio from text descriptions

  • Neolocus: efficient and photorealistic interior and room design

  • Helicone: LLM-observability for developers - open-source platform for logging, monitoring, and debugging

  • Epsilla: all-in-one platform to develop and deploy AI agents powered by large language models and vector search technologies

GUIDES, LISTS, PRODUCTS, UPDATES, INFORMATIVE

VIDEOS, SOCIAL MEDIA & PODCASTS

  • Exclusive: The Verge tried Meta's AR glasses with Mark Zuckerberg [YouTube]

  • 10 ways to use NotebookLM, in less than 10 minutes [YouTube]

  • 10 wild examples of ChatGPT’s new enhanced voice mode [X]

  • Pika 1.5 can generate some pretty insane videos [X]

  • Open source AI platform Hugging Face has reached 1 million free public AI models on its platform, and has almost as many private/for business use only [X]

  • Machine Learning Street Talk - Ben Goertzel on “Superintelligence” [Podcast]

  • OpenAI's Hunter Lightman says the new o1 AI model is already acting like a software engineer and authoring pull requests, and Noam Brown says everyone will know AGI has been achieved internally when they take down all their job listings [Reddit]

TECHNICAL NEWS, DEVELOPMENT, RESEARCH & OPEN SOURCE

  • Nvidia’s “open” NVLM competes with GPT-4o, but open-source it is not

  • Google DeepMind: How AlphaChip transformed computer chip design

  • Emu3: a new suite of state-of-the-art multimodal models trained solely with next-token prediction

  • Apple research on MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning

  • GPU MODE IRL 2024 Keynotes (Karpathy talks about moving from PyTorch to bare-metal C, stripping down language model training to its core)

  • Anthropic reduces the error rate of RAGs by 67% using a simple method called "Contextual Retrieval"

  • Beyond transformers, and beyond Mamba: Liquid Foundation Models (LFMs) – a new generation of generative AI models that achieve state-of-the-art performance at every scale, while maintaining a smaller memory footprint and more efficient inference

That’s all for this week! We’ll see you next Thursday.