WeeklyDispatch.AI
Posts
The week in AI: Everything you need to know about AI 'agents', the newest wave in AI research

The week in AI: Everything you need to know about AI 'agents', the newest wave in AI research

Plus: 'Stargate' - Microsoft's $100b AI supercomputer for OpenAI

The Dispatch
April 04, 2024

Sponsored by

Welcome to The Dispatch! We are the newsletter that keeps you informed about AI. Each Thursday, we aggregate the major developments in artificial intelligence; we pass along the news, useful resources, tools and services, and highlight the top research in the field as well as exciting developments in open source. Even if you aren’t an engineer, we’ll keep you in touch with what’s going on in AI.

NEWS & OPINION

What are AI agents - and who profits from them?

-------------------------

Research and breakthroughs in AI ‘agents’ have become a common sight in our technical section at the bottom of the newsletter, and this week we’re highlighting an excellent piece from Napkin Math on agentic workflow: what agents are, who some of the major players in their development are, and what the future potentially holds for this type of AI.

AI agents are entities designed to think and act independently in order to achieve specific goals. Agents symbolize a workflow architecture that involves multi-step processes that operate with minimal or no human intervention. These agents represent a level of autonomy and efficiency that could transform how many, many tasks are performed. Because they can optimize and execute individual steps within a broader task, AI agents look primed to pave the way for sophisticated applications of AI on a much bigger scale than anything we’ve seen so far with LLMs. The race to develop these agents involves not just creating more advanced models, but also integrating these models into products that can effectively solve real-world problems.

How effective are AI agents? Industry pioneer Andrew Ng highlighted some recent research from DeepLearningAI on coding agent effectiveness:

“My team has been closely following the evolution of AI that writes code. We analyzed results from a number of research teams, focusing on an algorithm’s ability to do well on the widely used HumanEval coding benchmark.

GPT-3.5 (zero shot) was 48.1% correct. GPT-4 (zero shot) does better at 67.0%. However, the improvement from GPT-3.5 to GPT-4 is dwarfed by incorporating an iterative agent workflow. Indeed, wrapped in an agent loop, GPT-3.5 achieves up to 95.1%.“

That’s profound for AI development: through agentic iteration, you can achieve superior results with an inferior model. Expect to hear a lot about AI agents in the coming months.

GPT-4 is 82% more persuasive than humans, and AIs can now read emotions

-------------------------

If you needed any convincing about the looming dangers of AI-fueled manipulation, a New Atlas article highlighting recent Swiss research on AI’s current ability to manipulate and emerging ‘emotionally responsive’ AI might do the trick.

In a study involving 820 Swiss participants, GPT-4 demonstrated significant effectiveness in altering people's viewpoints on various issues, ranging from low-emotion topics to highly contentious matters. Participants engaged in text-based debates with both humans and GPT-4. Initially, GPT-4 achieved only a modest improvement vs. humans in persuading individuals. But the study took an intriguing turn when demographic details about participants (gender, age, race, education, political orientation, etc.) were provided along with instructions to tailor arguments accordingly. This approach diminished the effectiveness of human debaters, yet remarkably enhanced GPT-4's persuasive capabilities, making it 81.7% more effective than humans in altering viewpoints.

And that’s with just some basic information. The introduction of tools capable of detecting emotional nuances in voice tones, and other models that analyze facial expressions and movements, will only further escalate AI's ability to influence by leveraging our biological responses in their outputs. These advancements promise some beneficial applications, yet they obviously pose major risks of misuse for manipulation in advertising, law enforcement, and political campaigning, among other areas.

The US government in AI this week: major global AI partnerships being formalized, Biden orders every federal agency to appoint Chief AI officers

-------------------------

The US government has announced new global AI partnerships with both the UK and Japan.

US Commerce Secretary Gina Raimondo and UK Technology Secretary Michelle Donelan signed an in Washington to jointly develop advanced AI model testing, following commitments announced at the AI Safety Summit in Bletchley Park in November. Under the formal partnership, Britain and the United States plan to perform joint testing exercises on publicly accessible models and are considering exploring personnel exchanges between AI safety institutes. Both are working to develop similar partnerships with other countries to promote AI safety. Although the UK has a much more hands-off approach to AI regulation than the EU, they have already committed $125 million to the UK AI Safety Institute. The US has so far committed $10 million to the US AI Safety Institute.

As Japanese PM Fumio Kishida gets set to meet with President Biden in Washington on April 10th, the two countries are expected to announce closer cooperation in both artificial intelligence and semiconductors. As part of the agreement, Japan and the US will likely set up a framework for AI research and development with tech giants Nvidia, Amazon, Arm and others. The US has moved aggressively to halt shipments of advanced AI chips/technology to China that could strengthen its military and sees Japan as an important friendshoring partner.

Additionally, President Biden announced that every federal agency must appoint a chief AI officer with ‘significant expertise in AI’. Some agencies have already appointed chief AI officers, but any agency that has not must appoint one within 60 days. Among many outlined responsibilities, chief AI officers in federal agencies will primarily focus on guiding AI initiatives, assessing and managing risks to ensure safety and compliance with ethical standards, and developing strategies for budget and workforce to leverage AI effectively.

MORE IN AI THIS WEEK

Amazon abandons grocery stores where you just walk out with stuff - after it turns out its “AI” was powered by 1,000 human contractors
Billie Eilish, Pearl Jam, Nicki Minaj among 200 artists calling for responsible AI music practices
Is Microsoft’s $100 billion ‘Stargate’ OpenAI supercomputer AI’s ‘Star Wars’ moment?
Elon Musk’s xAI plans to launch Grok 1.5 this week
How AI is being used to prevent illegal fishing
Yahoo is buying Artifact, the AI news app from the Instagram co-founders
The fine art of human prompt engineering: How to talk to a person like ChatGPT
Amazon Bets $150 Billion on Data Centers Required for AI Boom
OpenAI debuts voice cloning tool, but deems it too risky for public release
Google DeepMind CEO Demis Hassabis gets UK knighthood for ‘services to artificial intelligence’

Learn 20+ AI Tools, ChatGPT & Prompting techniques for FREE

This 3-hour ChatGPT & AI Workshop will help you automate tasks & simplify your life using AI at no cost. (+ you get a bonus worth $500 on registering) 🎁

Click to Register ($0 for the First 100 people)

With AI & Chatgpt, you will be able to:

✅ Make smarter decisions based on data in seconds using AI

✅ Automate daily tasks and increase productivity & creativity

✅ Solve complex business problem to using the power of AI

✅ Build stunning presentations & create content in seconds

👉 Hurry! Click here to register (Limited seats: FREE for First 100 people only)🎁

TRENDING AI TOOLS & SERVICES

Replit Teams: brings the power of collaboration and AI to the workplace
Multi-On: Agent API now available
big-AGI: upgraded with ‘beam’ chat mode - multi-model AI reasoning
Ribbon: AI-powered job search tools
Canyon: apply, track, and prepare for jobs – all on one platform
CodeRabbit: cut code review time and bugs in half using AI
Magic Hour: all-in-one AI video creation platform that streamlines content production
deepinfra: run the top AI models using a simple pay per use API - low cost, scalable and production ready infrastructure
Upscayl: enlarge and enhance low-resolution images using advanced AI algorithms - free/open source

GUIDES, LISTS, PRODUCTS, UPDATES, USEFUL

How to use Copilot AI to make amazing PowerPoint slides
Generative AI search engine Perplexity will soon have ads
A practical introduction to AI for developers
How AI reshapes vocabulary: unveiling the most used terms related to the technology
5 recommended AI art generators (including free options)
You can now edit the images you create with OpenAI’s DALL-E

VIDEOS, SOCIAL MEDIA & PODCASTS

But what is a GPT? Visual intro to Transformers [YouTube]
Why does OpenAI need a 'Stargate' supercomputer? Ft. Perplexity CEO Aravind Srinivas [YouTube]
AI pioneer Andrew Ng shows the power of AI agents - “The future is agentic” [YouTube]
(Discussion) Google Gemini's context window is much larger than anyone else's [Reddit]
NBA’s Indiana Pacers use Snapchat AI filters to make it look like Los Angeles Lakers fans were crying during game [X]
Google will update TOS introduce pay-as-you-go pricing for the Gemini API on May 2nd [X]
Should I be using AI right now? 10 practical tips from Professor Ethan Mollick [Podcast]

TECHNICAL, RESEARCH & OPEN SOURCE

SWE-agent: turning language models into software engineering agents that can fix bugs and issues in real GitHub repositories

-------------------------

Just a few weeks ago in this section, we featured Cognition AI’s Devin - the first AI software engineer. Now, Princeton’s Natural Language Processing (NLP) group has introduced SWE-agent, an open source system deploying GPT-4 to autonomously address issues in GitHub repositories.

SWE-agent achieves accuracy comparable to Devin on the SWE-bench (which evaluates LLMs on real-world software issues collected from GitHub). Given a codebase and an issue, a language model is then tasked with generating a patch that resolves the described problem.

SWE-agent’s accuracy is 12.29% of GitHub issues on the SWE-bench. That might not seem like much, but it showcases the potential of using agents with a good agent-computer interface for debugging: current state of the art LLM’s on their own only have 2-4% accuracy on SWE-bench. SWE-agent has an average issue resolution time of 93 seconds. Users will need Docker and Miniconda for setup, and a GitHub token is required for operation.

UC Berkley researchers create a Function-Calling Leaderboard for LLMs

-------------------------

With function calling, it’s becoming increasingly common to integrate LLMs into a wide range of applications and software systems. By calling the right functions/tools, LLMs can automate tasks, perform data analysis, quickly retrieve stored information - even control physical systems or devices.

And now there’s a function-calling leaderboard for LLMs. The Berkley Function-Calling Leaderboard aims to provide a thorough analysis of the function-calling capability of different LLMs. It consists of a dataset of 2,000 question-function-answer pairs, covering various programming languages (Python, Java, JavaScript, SQL), application domains, and complex scenarios like multiple function calls and parallel function calls. Their metrics also evaluate the model's ability to detect when provided functions are irrelevant to the given question.

In addition to the leaderboard itself, the researchers’ blog post discusses common mistakes made by LLMs in generating function calls, such as handling implicit parameter conversions and missing URLs in REST API calls. It also provides insights into when to use function calling versus prompting and presents model cards detailing the function-calling support and data type support of various LLMs.

MORE IN T/R/OS

Anthropic research: Many-shot jailbreaking
From Mark Liu, the Chairman of TSMC: How we’ll reach a 1 trillion transistor GPU (advances in semiconductors feeding the AI boom)
First impressions of early-access GPT-4 fine-tuning
Cloudflare is aiming to be the #1 choice for inference
Jamba: the first production-grade Mamba-based (a non-transformer LLM architecture) AI Model
Apple researchers develop AI that can ‘see’ and understand screen context

That’s all for this week! We’ll see you next Thursday.