Decision loops are the heart of any truly agentic system. They're what separate intelligent agents from simple scripts. But implementing them correctly? That's where things get tricky.

Let me walk you through what we've learned building decision loops for Fantomu—the mistakes we made, the solutions we found, and the patterns that actually work.

Why Linear Workflows Break Down

Most automation follows a simple pattern: execute step 1, then step 2, then step 3. If anything fails, stop and alert a human.

This works great for predictable scenarios. But real-world problems are messy. APIs go down. Data formats change. Edge cases appear that you never anticipated.

That's where decision loops come in. Instead of blindly following steps, an agent evaluates its situation, makes a decision, acts on it, and then evaluates again. It adapts based on what it discovers.

The Anatomy of a Decision Loop

At its core, a decision loop has five stages:

Observe: Gather information about the current state
Evaluate: Analyze the situation and determine what needs to happen
Decide: Choose a specific action based on the evaluation
Act: Execute the chosen action
Reflect: Assess whether the goal was achieved, then loop back to observe

This sounds simple, but each stage has its own challenges. Let's break them down.

Structured Outputs: The Foundation

Before we dive into implementation, there's one critical rule: never parse natural language for decisions.

I've seen too many systems fail because they asked an LLM "What should I do?" and tried to parse responses like "I think you should probably retry, but maybe with a different approach, or perhaps wait a bit first?"

Instead, always use structured outputs. Define a clear schema for decisions:

{
  "action": "retry" | "clarify" | "abort" | "pivot",
  "strategy": "exponential_backoff" | "different_endpoint" | 
              "simplified_request" | "manual_review",
  "reasoning": "Brief explanation of why this action was chosen",
  "confidence": 0.0-1.0,
  "nextStep": "Specific action to take",
  "maxRetries": 3
}

This eliminates ambiguity. The decision is always parseable, always actionable, and always debuggable.

Building Safeguards

A decision loop without safeguards is a recipe for infinite loops and runaway processes. Here's what we've learned to include:

Iteration Limits

Set a maximum number of iterations. If the agent hasn't achieved its goal after N attempts, it should either ask for help or abort gracefully.

We typically start with 5-10 iterations, but this varies based on the complexity of the task.

Confidence Thresholds

If the agent's confidence in its decision drops below a threshold (we use 0.7), it should ask for human clarification rather than guessing.

This prevents the agent from making bad decisions when it's uncertain.

Error Pattern Detection

Track errors. If the same error occurs three times in a row, the agent should try a fundamentally different approach rather than repeating the same failed action.

This prevents the agent from getting stuck in a loop of retrying the same broken approach.

Timeout Handling

Set timeouts for each iteration and for the overall task. If things are taking too long, abort gracefully and report what was accomplished.

Intelligent Retry Logic

When something fails, don't just retry the exact same action. That's not intelligent—that's just persistence.

Instead, implement adaptive retry logic:

Analyze the failure: What went wrong? Was it a network error? A validation error? A timeout?
Choose a strategy: Based on the error type, select an appropriate retry strategy
Adjust parameters: Modify the approach (wait longer, use a different endpoint, simplify the request)
Try again: Execute with the new approach
Learn: If it still fails, try a fundamentally different strategy

For example, if an API call fails with a 429 (rate limit), the agent should wait with exponential backoff. If it fails with a 404, it should try a different endpoint. If it fails with a 500, it might try a simplified request or a different service entirely.

Real-World Example

Here's a simplified example from Fantomu. An agent needs to fetch user data from an API:

Observe: Check if we have a valid API token and the user ID
Evaluate: Determine we need to make an API call
Decide: Choose to call the primary endpoint with exponential backoff strategy
Act: Make the API call
Reflect: Got a 429 error (rate limited)
Observe: We're being rate limited, this is attempt #1
Evaluate: We should wait and retry
Decide: Wait 2 seconds, then retry with same strategy
Act: Wait, then retry
Reflect: Got a 200 response with user data
Goal achieved: Exit loop

This is a simple example, but it shows how the agent adapts based on what it discovers, rather than just following a script.

Lessons Learned

Building decision loops is hard. It requires careful design, extensive testing, and constant iteration. But when it works, it's magical—the agent feels genuinely intelligent.

Start simple. Build a basic loop first, then add complexity only when you need it. Use structured outputs everywhere. And always, always include safeguards.

Your future self—and your users—will thank you.

Building Decision Loops for AI Agents