02 Application Architectures

Module 2: Application & Agent Architectures

Designing an LLM application is not just about picking a model—it’s about picking (and often combining) the right architecture pattern along a spectrum that runs from a single LLM call to fully-autonomous multi-agent swarms.
This chapter unifies the perspectives from three excellent deep-dives:

Below is a distilled map, guidance on when to stop at a workflow vs. when to move to an agent, and concrete patterns you can apply in Langfuse-instrumented projects.

The Architecture Ladder

The three sources use slightly different names; we merge them into six rungs of increasing “agency.”

Rule of thumb – climb only as high as you need:

Workflows (R0-R4) shine when you value predictability, testability, low latency, and tight context control.
Agents (R5-R6) shine when the path is unknown a-priori, tooling decisions are dynamic, or the user expects open-ended autonomy.

Canonical Patterns

Pattern	Typical Use-Case	Key Pros	Key Cons
Prompt Chaining	Deterministic multi-step doc generation	Easy to debug	Rigid, brittle when input drifts
Routing / Handoff	Tier-1 support → specialised prompts	Cheap requests go to smaller models	Mis-routing tanks quality
Parallelisation	Map-reduce summarisation, guardrails	Reduces latency	Cost × N, aggregation complexity
Evaluator–Optimizer	”Draft → critique → revise” loops	Builds quality offline or online	Adds tokens & delay
Orchestrator–Workers	Retrieval + synthesis workflows	Clear separation of concerns	Needs robust state passing
Tool-Calling ReAct	One-shot Q&A with calculator / web	Simple mental model	Parsing / hallucination risk
Planning Agent	Multi-file code-refactor, research	Deeper reasoning	Planning errors snowball
Reflection	Self-consistency, safety checks	Cuts hallucinations	Extra calls and $$
Memory-Augmented	Long customer sessions	Personalised UX	Memory staleness / cost
Multi-Agent Swarm	Brainstorming, negotiation sims	Diverse reasoning	Hardest to debug

Selecting the Right Approach

Define “good” first. Accuracy? Cost? Latency? Trust?
Prototype as R1 (single call). Measure offline with Langfuse datasets.
When metric plateaus, move to R2 → R3.
Adopt agents only if the task cannot be expressed as a bounded graph.

“The hard part of reliable agents is passing the right context at every step.” — Harrison Chase

Langfuse provides the tracing you need to see that context. Every node/tool invocation you build becomes a traced span that you can later debug, evaluate, and cost-optimise.

Implementation Tips (from all three sources)

Tool schema = prompt. Document args, edge-cases, examples.
Guardrails hierarchy: JSON schema → allow-list APIs → max-iterations → human-approval.
Persist state (checkpoints) for fault-tolerance and to enable offline re-runs in Langfuse.
Add reflection early. A cheap 2nd-model critique catches many hallucinations.
Cost caps. Track usage.total_cost in traces; autonomy creep is real.

Was this page useful?

Questions? We're here to help

GitHub Q&AEmail Talk to sales