Skip to content

Architecture Patterns for Autonomous AI Agents

A
abemon
| | 12 min read | Written by practitioners
Share

The monolithic agent is a dead end

We started like everyone does. One prompt, one model, a list of tools, and see what happens. Our first production agent processed customer emails for a logistics company. It classified, extracted data, and drafted a response. All in a single call.

It worked for three weeks. Then a customer sent an email with a PDF attachment containing a scanned image of a delivery note. The agent tried to parse the PDF, failed, retried with a different strategy, failed again, and entered a reasoning loop that consumed 47,000 tokens before the circuit breaker killed it. Cost: EUR 1.20 for one email. The average was EUR 0.03.

That incident forced us to rethink the architecture. What follows are the four patterns we use now, with the real tradeoffs of each.

Pattern 1: ReAct with strict bounds

ReAct (Reasoning + Acting) is the simplest pattern that can work in production. The agent reasons about the task, decides which tool to use, observes the result, and repeats until the task is complete or a limit is reached.

The key is the limits. Without them, ReAct is a blank check. Our constraints:

  • Maximum steps: 8 for simple tasks, 15 for complex ones. If the agent does not resolve in 15 steps, it escalates.
  • Token budget: ceiling per execution. At 80% of the limit, the agent receives an instruction to compress its reasoning.
  • Per-tool timeout: 30 seconds. If an external API does not respond, the agent logs the failure and decides whether to use a fallback or escalate.
  • Closed tool list: the agent can only use the tools we assign. It does not discover tools dynamically.

The cycle is: Observe -> Think -> Act -> Verify result -> Repeat or finalize.

Where it works well: tasks with 2-5 steps, predictable tools, where the model can reason linearly. Document classification with field extraction. API queries with result interpretation. Data-driven response generation.

Where it fails: tasks requiring coordination across multiple data sources, or where the decision space is too broad for a single agent.

Practical implementation: we use LangGraph with a single-node graph that runs the ReAct cycle. The state includes the action history, tokens consumed, and a step counter. The exit condition is: task completed, limit reached, or unrecoverable error.

Pattern 2: Supervisor with specialized workers

This is our default pattern for complex tasks. A supervisor agent orchestrates multiple worker agents, each specialized in a specific capability.

The structure:

Supervisor
├── Document reader (extraction, OCR)
├── Data consultant (external APIs, databases)
├── Output generator (responses, reports)
└── Validator (verifies result quality)

The supervisor does not execute tools directly. Its job is: receive the task, decompose it into subtasks, assign each subtask to the appropriate worker, receive results, decide if more steps are needed, and compose the final output.

Each worker is a bounded ReAct agent with its own set of tools and limits. The document reader only has access to parsing and OCR tools. The data consultant only to APIs and databases. This separation has three benefits:

Security. A compromised or hallucinating worker cannot execute tools outside its scope. The document reader cannot send emails.

Testability. You can test each worker in isolation with known inputs and outputs.

Cost. Each worker can use a different model. The classifier uses Haiku (cheap, fast). The output generator uses Sonnet (better text quality). Only the supervisor needs a model with strong reasoning capability.

The main tradeoff is latency. A task that a monolithic agent would solve in 3 seconds might take 8-12 seconds with supervisor + workers due to multiple LLM roundtrips. For back-office tasks, that is irrelevant. For real-time user interactions, it can be a problem.

Implementation: in LangGraph, the supervisor is a node that contains a subgraph for each worker. The global graph state contains the task, subtasks, partial results, and final result. Transitions between nodes are conditioned by the supervisor’s decisions.

Pattern 3: Finite state machine with LLM

This pattern is for processes where the flow is known but the decisions within each step require an LLM. Think of an invoice approval workflow:

Receive invoice -> Extract data -> Validate against PO ->
  [OK] -> Record in ERP -> Notify
  [Error] -> Request correction -> Wait -> Re-validate
  [Uncertain] -> Escalate to human

The flow is a state machine. Each transition is deterministic. But within each state, the LLM decides. In “Extract data,” the agent parses the document. In “Validate against PO,” it compares extracted fields with ERP data. In “Request correction,” it drafts an email to the supplier.

The advantage over the supervisor pattern: it is far more predictable. The flow cannot escape the defined states. There is no possibility of the agent deciding “I will do something completely different.” Costs are predictable because each state has a fixed token budget.

The disadvantage: it is rigid. If a case arrives that does not fit the defined states, there is no way to handle it without modifying the flow. Each new case type requires updating code, not just adjusting a prompt.

When we use it: for regulated business processes or those with strict SLAs. Invoice processing, approval workflows, customer onboarding. Any process where predictability matters more than flexibility.

Implementation: LangGraph is ideal for this. Each state is a node. Transitions are conditional edges. The graph state contains the process data and each step’s result. We can serialize the full state, which allows pausing and resuming processes (for example, when waiting for human approval).

Pattern 4: Event-driven with reactive agents

The pattern for systems that need to respond to real-time events without a central orchestrator.

The architecture:

Event queue (Kafka / Redis Streams)
├── Classification agent (listens: email.received)
├── Processing agent (listens: document.classified)
├── Notification agent (listens: task.completed, task.failed)
└── Monitoring agent (listens: *.*)

Each agent subscribes to specific event types. When an event arrives, it processes it and may emit new events that trigger other agents. There is no central controller; the flow emerges from the composition of events.

This pattern is the most scalable. You can add or remove agents without touching existing ones. If you need a new type of processing, deploy a new agent that listens to the relevant events. If an agent fails, events stay in the queue and are retried.

The main risk, as we covered in our AI agents whitepaper: event storms. An event that generates another event that generates another can create cascades that consume resources without control. The solution has three layers:

  1. Deduplication. Each event has a unique ID. If an agent already processed that ID, it ignores it.
  2. Rate limiting. Each agent has a maximum executions per minute. If reached, events queue until there is capacity.
  3. Dead letter queue. Events that fail three times go to a dead letter queue for manual review.

When we use it: for systems that already have an event-based microservice architecture. If your backend already uses Kafka or RabbitMQ, adding agents as consumers is natural. If your architecture is monolithic, this pattern introduces complexity you probably do not need.

Cross-cutting components

Regardless of the pattern, three components appear in every project.

Structured outputs with validation

We never let an agent return free-form text when we need structured data. Every tool defines a Pydantic or Zod schema for its output. If the LLM output does not conform to the schema, it is rejected and retried with an explicit error instruction.

In practice, Claude and GPT-4o conform to the schema correctly 95-98% of the time on the first call. But that 2-5% remainder is enough to generate dozens of errors per day at high volume. Validation is mandatory, not optional.

Integrated observability

Every agent emits OpenTelemetry traces with custom spans for each action type. Traces include: input and output tokens, latency per call, result of each tool, and agent decision at each step.

We use LangSmith for visualization and debugging, but traces also flow to our general observability stack (Grafana + Loki) for correlation with infrastructure metrics.

Circuit breakers and fallbacks

Every tool an agent can call has a circuit breaker. If it fails 3 consecutive times, it is disabled for 5 minutes. The agent receives a notification that the tool is unavailable and can decide to use a fallback or escalate.

At the agent level, if the error rate exceeds 20% in a 5-minute window, the entire agent is disabled and tasks are routed to the manual processing queue. This prevents silent degradation where the agent keeps running but produces low-quality results.

How to choose the right pattern

The decision depends on three variables:

VariableReActSupervisorState machineEvent-driven
Task complexityLow-mediumHighMedium-highVariable
Flow predictabilityLowMediumHighLow
Latency requirementsLowMediumLowMedium
Ease of testingMediumHighVery highMedium
Infrastructure costLowMediumLowHigh

The practical rule: start with ReAct. If it falls short, move to supervisor. If the flow is known and regulated, use a state machine. If you already have an event architecture, use event-driven.

Do not mix patterns without a clear reason. We have seen teams implement a supervisor that contains state machines that emit events that trigger other supervisors. The result is unmaintainable. Choose one primary pattern and use it consistently.

For a deeper dive into these patterns applied to real use cases, check our AI and Machine Learning service. If you are evaluating which pattern fits your case, our consulting team offers 2-week technical assessments.

We also recommend reviewing our data engineering practice to understand how to prepare the data that will feed your agents.

About the author

A

abemon engineering

Engineering team

Multidisciplinary engineering, data and AI team headquartered in the Canary Islands. We build, deploy and operate custom software solutions for companies at any scale.