Prompt Inversion > Blog > Choosing the Right Agentic Framework II

March 24, 2025

Choosing the Right Agentic Framework II

We finish grading six of the most popular agentic frameworks: LangChain’s LangGraph, Microsoft’s AutoGen, Pydantic’s PydanticAI, CrewAI, OpenAI’s Swarm, and Hugging Face’s Smolgents

In the first part pf this series, we walked through the setup of our multi-agent spam classification system and defined key terminology like message passing, state management, and tools. Now, it’s time to compare how these frameworks perform in practice.

Here, we’ll break down LangGraph, AutoGen, PydanticAI, CrewAI, Swarm, and Smolagents across five critical factors: message passing, state management, tool calling, quality of documentation, and ease of use. Whether you’re building a production-ready system, prototyping quickly, or creating a personal agentic tool, this comparison will help you pick the right framework for the job.

1. Message Passing

LangGraph, PydanticAI (A+): Message passing was consistent, with orderly agent flow and no issues handling the feedback loop. LangGraph’s directed graph workflow (agents as nodes, handoffs defined by edges) and conditional edges seamlessly managed execution order, especially for GPT-BERT disagreements. In PydanticAI, each agent had to be triggered individually, with outputs chained together to manage execution.

Swarm (A): Mostly consistent but occasionally ran the BERT or GPT agent twice in a row. While it didn’t affect the overall execution, this behavior was less than ideal. We also had to write custom transfer functions for handoffs, which felt unnecessary.

AutoGen, CrewAI (B+): AutoGen’s Swarm (not to be confused with OpenAI Swarm) multi-agent team had dedicated handoff sequences but occasionally got stuck at the GPT agent during feedback loops. CrewAI was mostly consistent but sometimes routed to the wrong agent, causing errors or premature termination.

Smolagents (C): The execution sometimes skipped entire agents, jumping straight to the output agent and hallucinating results. Each step would often take multiple retries.

2. State Management

LangGraph, PydanticAI, Swarm (A+): All three frameworks handled state and parameters robustly. LangGraph uses TypedDict to define states, which are cleanly updated and managed, with a checkpointer saving state snapshots after updates or handoffs. PydanticAI leverages dataclasses for type checking, validation, and dependency management, ensuring consistent and tool-accessible states. Swarm relies on context variables in the form of Python dictionaries that are manually updated after each agent run but remain consistent and accessible throughout execution.

AutoGen, CrewAI (A): State management in our implementation of AutoGen was mostly automated, with states embedded in LLM prompts and response structures. While consistent, it lacked granular control. CrewAI sometimes failed to pass correct parameter values to function tools, requiring careful workarounds.

Smolagents (B): State management was unstructured, with responses forced into verbose formats (e.g., requiring agents to return three answer sections: Task outcome (short), Task outcome (detailed), Additional context). This created unnecessary bloat, burying key data in extra output. However, the critical information was usually present.

3. Tool Calling

AutoGen, LangGraph, CrewAI (A+): All three frameworks offer extensive prebuilt tools with easy integration. Tools are consistently called at the right time, and integration with external libraries, APIs, or custom code is seamless.

PydanticAI, Swarm, SmolAgents (A): Tool calling was reliable, with tools registered and executed correctly. However, they lack the prebuilt tool libraries and advanced integration features of AutoGen and LangGraph.

4. Quality of Documentation

LangGraph, AutoGen, CrewAI and PydanticAI (A): Documentation is extensive, with basic concepts and terminology clearly explained, core features detailed through technical explanations and practical examples, and specialized features covered as well. CrewAI andAutoGen stand out with plentiful multi-agent workflow examples, though LangGraph and PydanticAI have fewer. However, all four frameworks have gaps in explaining under-the-hood workings of some of the more complex topics like parallel execution and memory management.

Swarm and SmolAgents (B): These lightweight frameworks focus on the basics, with documentation providing simple examples and straightforward explanations. However, they lack depth on advanced features or multi-agent workflows.

5. Ease of Use

Swarm, SmolAgents, PydanticAI (A+): Super easy to create agents, manage state, and register tools. Chaining agent executions is straightforward, making these frameworks ideal for quick prototyping.

LangGraph (B+): Slightly higher complexity due to graphical workflows and an extensive feature list. However, creating the actual agents and tools remains straightforward, balancing power with usability.

AutoGen, CrewAI (B): More challenging to use, especially when configuring prebuilt multi-agent teams and handling state management. These frameworks require more effort and carefully crafted prompts to get workflows running smoothly.

Final Grades: Which Framework Should You Choose?

1. LangGraph, Pydantic (A+): Ideal for building chatbot agents and real-time workflows. Offers streaming of LLM responses, robust state management, granular control over the execution and prebuilt agents and tools. Pydantic is the most user-friendly framework due to excellent documentation and reliable message passing, though debugging event execution errors can be challenging. Pydantic lacks prebuilt tool libraries and advanced integrations.

2. AutoGen (A): AutoGen excels in large-scale multi-agent systems with speed, scalability, and flexible team patterns but has a steeper learning curve.

3. CrewAI (B+): CrewAI is great for prototyping with a diverse feature set, but has inconsistent message passing, making it unsuitable for production-level systems.

4. Swarm/SmolAgents (B): Beginner-friendly but limited. Swarm helps create smooth workflows but relies heavily on OpenAI models. SmolAgents uses AutoGen for multi-agent tasks but struggles with consistency. 

Recent blog posts

LLMs

Detecting AI-Generated Content: How to Spot the Bots 

We explore various methods to detect whether text is generated by LLMs.

March 31, 2025
Albert Chen
Read more
LLMs
Agents

Choosing the Right Agentic Framework II

We finish grading six of the most popular agentic frameworks: LangChain’s LangGraph, Microsoft’s AutoGen, Pydantic’s PydanticAI, CrewAI, OpenAI’s Swarm, and Hugging Face’s Smolgents

March 24, 2025
Tejas Gopal
Read more
Agents
LLMs

Choosing the Right Agentic Framework I

We dissect six of the most popular agentic frameworks: LangChain’s LangGraph, Microsoft’s AutoGen, Pydantic’s PydanticAI, CrewAI, OpenAI’s Swarm, and Hugging Face’s Smolgents

March 17, 2025
Tejas Gopal
Read more