This is a demo, not an essay — but a few notes on what I learned while building it.

The architecture

A planner agent decomposes a research question into sub-questions. Worker agents fan out to retrieve evidence in parallel. A synthesizer agent merges the results into a single answer with citations.

What surprised me

The biggest unlock wasn't the agent topology — it was the retry semantics. When a worker agent failed, naïvely re-running it produced the same failure. Adding a small "reflection" step where the worker explained why it failed before retrying improved success rates by ~30%.

What I'd do differently

I'd start with a simpler tool-use loop and only introduce multi-agent structure when I had concrete evidence that a single agent was the bottleneck. Multi-agent is seductive but expensive — both in latency and in debuggability.