• WeeklyDispatch.AI
  • Posts
  • The week in AI: Everything you need to know about AI 'agents', the newest wave in AI research

The week in AI: Everything you need to know about AI 'agents', the newest wave in AI research

Plus: 'Stargate' - Microsoft's $100b AI supercomputer for OpenAI

Sponsored by

Welcome to The Dispatch! We are the newsletter that keeps you informed about AI. Each Thursday, we aggregate the major developments in artificial intelligence; we pass along the news, useful resources, tools and services, and highlight the top research in the field as well as exciting developments in open source. Even if you aren’t an engineer, we’ll keep you in touch with what’s going on in AI.

NEWS & OPINION

-------------------------

Research and breakthroughs in AI ‘agents’ have become a common sight in our technical section at the bottom of the newsletter, and this week we’re highlighting an excellent piece from Napkin Math on agentic workflow: what agents are, who some of the major players in their development are, and what the future potentially holds for this type of AI.

AI agents are entities designed to think and act independently in order to achieve specific goals. Agents symbolize a workflow architecture that involves multi-step processes that operate with minimal or no human intervention. These agents represent a level of autonomy and efficiency that could transform how many, many tasks are performed. Because they can optimize and execute individual steps within a broader task, AI agents look primed to pave the way for sophisticated applications of AI on a much bigger scale than anything we’ve seen so far with LLMs. The race to develop these agents involves not just creating more advanced models, but also integrating these models into products that can effectively solve real-world problems.

How effective are AI agents? Industry pioneer Andrew Ng highlighted some recent research from DeepLearningAI on coding agent effectiveness:

“My team has been closely following the evolution of AI that writes code. We analyzed results from a number of research teams, focusing on an algorithm’s ability to do well on the widely used HumanEval coding benchmark.

GPT-3.5 (zero shot) was 48.1% correct. GPT-4 (zero shot) does better at 67.0%. However, the improvement from GPT-3.5 to GPT-4 is dwarfed by incorporating an iterative agent workflow. Indeed, wrapped in an agent loop, GPT-3.5 achieves up to 95.1%.“

That’s profound for AI development: through agentic iteration, you can achieve superior results with an inferior model. Expect to hear a lot about AI agents in the coming months.

-------------------------

If you needed any convincing about the looming dangers of AI-fueled manipulation, a New Atlas article highlighting recent Swiss research on AI’s current ability to manipulate and emerging ‘emotionally responsive’ AI might do the trick.

In a study involving 820 Swiss participants, GPT-4 demonstrated significant effectiveness in altering people's viewpoints on various issues, ranging from low-emotion topics to highly contentious matters. Participants engaged in text-based debates with both humans and GPT-4. Initially, GPT-4 achieved only a modest improvement vs. humans in persuading individuals. But the study took an intriguing turn when demographic details about participants (gender, age, race, education, political orientation, etc.) were provided along with instructions to tailor arguments accordingly. This approach diminished the effectiveness of human debaters, yet remarkably enhanced GPT-4's persuasive capabilities, making it 81.7% more effective than humans in altering viewpoints.

And that’s with just some basic information. The introduction of tools capable of detecting emotional nuances in voice tones, and other models that analyze facial expressions and movements, will only further escalate AI's ability to influence by leveraging our biological responses in their outputs. These advancements promise some beneficial applications, yet they obviously pose major risks of misuse for manipulation in advertising, law enforcement, and political campaigning, among other areas.

-------------------------

The US government has announced new global AI partnerships with both the UK and Japan.

US Commerce Secretary Gina Raimondo and UK Technology Secretary Michelle Donelan signed an in Washington to jointly develop advanced AI model testing, following commitments announced at the AI Safety Summit in Bletchley Park in November. Under the formal partnership, Britain and the United States plan to perform joint testing exercises on publicly accessible models and are considering exploring personnel exchanges between AI safety institutes. Both are working to develop similar partnerships with other countries to promote AI safety. Although the UK has a much more hands-off approach to AI regulation than the EU, they have already committed $125 million to the UK AI Safety Institute. The US has so far committed $10 million to the US AI Safety Institute.

As Japanese PM Fumio Kishida gets set to meet with President Biden in Washington on April 10th, the two countries are expected to announce closer cooperation in both artificial intelligence and semiconductors. As part of the agreement, Japan and the US will likely set up a framework for AI research and development with tech giants Nvidia, Amazon, Arm and others. The US has moved aggressively to halt shipments of advanced AI chips/technology to China that could strengthen its military and sees Japan as an important friendshoring partner.

Additionally, President Biden announced that every federal agency must appoint a chief AI officer with ‘significant expertise in AI’. Some agencies have already appointed chief AI officers, but any agency that has not must appoint one within 60 days. Among many outlined responsibilities, chief AI officers in federal agencies will primarily focus on guiding AI initiatives, assessing and managing risks to ensure safety and compliance with ethical standards, and developing strategies for budget and workforce to leverage AI effectively.

MORE IN AI THIS WEEK

This 3-hour ChatGPT & AI Workshop will help you automate tasks & simplify your life using AI at no cost. (+ you get a bonus worth $500 on registering) 🎁

With AI & Chatgpt, you will be able to:

✅ Make smarter decisions based on data in seconds using AI 

✅ Automate daily tasks and increase productivity & creativity

✅ Solve complex business problem to using the power of AI

✅ Build stunning presentations & create content in seconds

👉 Hurry! Click here to register (Limited seats: FREE for First 100 people only)🎁

TRENDING AI TOOLS & SERVICES

  • Replit Teams: brings the power of collaboration and AI to the workplace

  • Multi-On: Agent API now available

  • big-AGI: upgraded with ‘beam’ chat mode - multi-model AI reasoning

  • Ribbon: AI-powered job search tools

  • Canyon: apply, track, and prepare for jobs – all on one platform

  • CodeRabbit: cut code review time and bugs in half using AI

  • Magic Hour: all-in-one AI video creation platform that streamlines content production

  • deepinfra: run the top AI models using a simple pay per use API - low cost, scalable and production ready infrastructure

  • Upscayl: enlarge and enhance low-resolution images using advanced AI algorithms - free/open source

GUIDES, LISTS, PRODUCTS, UPDATES, USEFUL

VIDEOS, SOCIAL MEDIA & PODCASTS

  • But what is a GPT? Visual intro to Transformers [YouTube]

  • Why does OpenAI need a 'Stargate' supercomputer? Ft. Perplexity CEO Aravind Srinivas [YouTube]

  • AI pioneer Andrew Ng shows the power of AI agents - “The future is agentic” [YouTube]

  • (Discussion) Google Gemini's context window is much larger than anyone else's [Reddit]

  • NBA’s Indiana Pacers use Snapchat AI filters to make it look like Los Angeles Lakers fans were crying during game [X]

  • Google will update TOS introduce pay-as-you-go pricing for the Gemini API on May 2nd [X]

  • Should I be using AI right now? 10 practical tips from Professor Ethan Mollick [Podcast]

TECHNICAL, RESEARCH & OPEN SOURCE

-------------------------

Just a few weeks ago in this section, we featured Cognition AI’s Devin - the first AI software engineer. Now, Princeton’s Natural Language Processing (NLP) group has introduced SWE-agent, an open source system deploying GPT-4 to autonomously address issues in GitHub repositories.

SWE-agent achieves accuracy comparable to Devin on the SWE-bench (which evaluates LLMs on real-world software issues collected from GitHub). Given a codebase and an issue, a language model is then tasked with generating a patch that resolves the described problem.

SWE-agent’s accuracy is 12.29% of GitHub issues on the SWE-bench. That might not seem like much, but it showcases the potential of using agents with a good agent-computer interface for debugging: current state of the art LLM’s on their own only have 2-4% accuracy on SWE-bench. SWE-agent has an average issue resolution time of 93 seconds. Users will need Docker and Miniconda for setup, and a GitHub token is required for operation.

-------------------------

With function calling, it’s becoming increasingly common to integrate LLMs into a wide range of applications and software systems. By calling the right functions/tools, LLMs can automate tasks, perform data analysis, quickly retrieve stored information - even control physical systems or devices.

And now there’s a function-calling leaderboard for LLMs. The Berkley Function-Calling Leaderboard aims to provide a thorough analysis of the function-calling capability of different LLMs. It consists of a dataset of 2,000 question-function-answer pairs, covering various programming languages (Python, Java, JavaScript, SQL), application domains, and complex scenarios like multiple function calls and parallel function calls. Their metrics also evaluate the model's ability to detect when provided functions are irrelevant to the given question.

In addition to the leaderboard itself, the researchers’ blog post discusses common mistakes made by LLMs in generating function calls, such as handling implicit parameter conversions and missing URLs in REST API calls. It also provides insights into when to use function calling versus prompting and presents model cards detailing the function-calling support and data type support of various LLMs.

MORE IN T/R/OS

That’s all for this week! We’ll see you next Thursday.