More from Google I/O 2024

More from Google I/O: Google unveils AI Agent project Astra, Video Generation Model Veo, 2M token Gemini 1.5 Pro and more

-------------------------

On top of the announcements highlighted in our e-mail, the Google I/O conference introduced:

Veo - Google DeepMind's most capable video generation model to date. It generates videos of high-quality with a1080p resolution that can go over a minute in a wide range of cinematic and visual styles.

  • Veo can take as input an image or a video along with a textual prompt. It can animate the image or edit the video when passed in the input. In addition, it supports masked editing, enabling changes to specific areas of the video when you add a mask area to your video and text prompt.

  • When it comes to technical details, Google shared that they added more details to the captions of each video in Veo's training data. The model uses high-quality, compressed representations of video known as latents) to improve performance, generation speed and efficiency.

Astra - Google's new project focused on building a future AI assistant - for everything.

  • Google's new assistant is powered by Gemini and supports audio, text, video and image shared in real-time. This project is still presented by Google as a prototype, and the capabilities of Astra were only shared through pre-recorded videos since it is still not available to all users.

  • Early testers report a longer latency, and less emotional intelligence and tone for Astra compared to GPT-4o, but strong text to speech and potentially better ongoing video and - for sure - better long context support. Google is ruling the long-context game right now.

Google unveiled two iterations of their flagship model and upgrades to Gemini 1.5 Pro.

  • Gemini 1.5 Pro Flash is the light-weight, fast and cost-efficient version of the model, meaning it is also multimodal and has a 1M token context length. The performance cost is small, with an MMLU of 78.9% compared to 81.9% for the original Gemini 1.5 Pro model.

  • Gemini 1.5 Pro had its context length doubled to 2M tokens. The new model is available via a waitlist for select developers building through the API.

Imagen 3 - their most capable image generation model, which will be available in multiple versions, each optimized for different types of tasks, from generating quick sketches to high-resolution images.

Gemma 2 and PaliGemma - two new open-source models added to the Gemma family. PaliGemma is Google's first vision-language open-source model and it is available now. Gemma 2 is a 27B parameter model that outperforms the previous version and will be available starting in June.

Veo, Astra and the 2M context version of Gemini 1.5 Pro are not available for now, but you can join the waitlist to get access.