Decision loops are the heart of any truly agentic system. They're what separate intelligent agents from simple scripts. But implementing them correctly? That's where things get tricky.
Let me walk you through what we've learned building decision loops for Fantomu—the mistakes we made, the solutions we found, and the patterns that actually work.
Why Linear Workflows Break Down
Most automation follows a simple pattern: execute step 1, then step 2, then step 3. If anything fails, stop and alert a human.
This works great for predictable scenarios. But real-world problems are messy. APIs go down. Data formats change. Edge cases appear that you never anticipated.
That's where decision loops come in. Instead of blindly following steps, an agent evaluates its situation, makes a decision, acts on it, and then evaluates again. It adapts based on what it discovers.
The Anatomy of a Decision Loop
At its core, a decision loop has five stages:
- Observe: Gather information about the current state
- Evaluate: Analyze the situation and determine what needs to happen
- Decide: Choose a specific action based on the evaluation
- Act: Execute the chosen action
- Reflect: Assess whether the goal was achieved, then loop back to observe
This sounds simple, but each stage has its own challenges. Let's break them down.
Structured Outputs: The Foundation
Before we dive into implementation, there's one critical rule: never parse natural language for decisions.
I've seen too many systems fail because they asked an LLM "What should I do?" and tried to parse responses like "I think you should probably retry, but maybe with a different approach, or perhaps wait a bit first?"
Instead, always use structured outputs. Define a clear schema for decisions:
{
"action": "retry" | "clarify" | "abort" | "pivot",
"strategy": "exponential_backoff" | "different_endpoint" |
"simplified_request" | "manual_review",
"reasoning": "Brief explanation of why this action was chosen",
"confidence": 0.0-1.0,
"nextStep": "Specific action to take",
"maxRetries": 3
}
This eliminates ambiguity. The decision is always parseable, always actionable, and always debuggable.
Building Safeguards
A decision loop without safeguards is a recipe for infinite loops and runaway processes. Here's what we've learned to include:
Iteration Limits
Set a maximum number of iterations. If the agent hasn't achieved its goal after N attempts, it should either ask for help or abort gracefully.
We typically start with 5-10 iterations, but this varies based on the complexity of the task.
Confidence Thresholds
If the agent's confidence in its decision drops below a threshold (we use 0.7), it should ask for human clarification rather than guessing.
This prevents the agent from making bad decisions when it's uncertain.
Error Pattern Detection
Track errors. If the same error occurs three times in a row, the agent should try a fundamentally different approach rather than repeating the same failed action.
This prevents the agent from getting stuck in a loop of retrying the same broken approach.
Timeout Handling
Set timeouts for each iteration and for the overall task. If things are taking too long, abort gracefully and report what was accomplished.
Intelligent Retry Logic
When something fails, don't just retry the exact same action. That's not intelligent—that's just persistence.
Instead, implement adaptive retry logic:
- Analyze the failure: What went wrong? Was it a network error? A validation error? A timeout?
- Choose a strategy: Based on the error type, select an appropriate retry strategy
- Adjust parameters: Modify the approach (wait longer, use a different endpoint, simplify the request)
- Try again: Execute with the new approach
- Learn: If it still fails, try a fundamentally different strategy
For example, if an API call fails with a 429 (rate limit), the agent should wait with exponential backoff. If it fails with a 404, it should try a different endpoint. If it fails with a 500, it might try a simplified request or a different service entirely.
Real-World Example
Here's a simplified example from Fantomu. An agent needs to fetch user data from an API:
- Observe: Check if we have a valid API token and the user ID
- Evaluate: Determine we need to make an API call
- Decide: Choose to call the primary endpoint with exponential backoff strategy
- Act: Make the API call
- Reflect: Got a 429 error (rate limited)
- Observe: We're being rate limited, this is attempt #1
- Evaluate: We should wait and retry
- Decide: Wait 2 seconds, then retry with same strategy
- Act: Wait, then retry
- Reflect: Got a 200 response with user data
- Goal achieved: Exit loop
This is a simple example, but it shows how the agent adapts based on what it discovers, rather than just following a script.
Lessons Learned
Building decision loops is hard. It requires careful design, extensive testing, and constant iteration. But when it works, it's magical—the agent feels genuinely intelligent.
Start simple. Build a basic loop first, then add complexity only when you need it. Use structured outputs everywhere. And always, always include safeguards.
Your future self—and your users—will thank you.
