Agent Orchestration Deep Dive

Two orchestration tools compared against our setup, and a hybrid architecture proposal that takes the best of each โ€” deterministic dispatch, LLM judgment, and fizzy-pop coordination.

Symphony โ€” openai/symphony Paperclip โ€” paperclipai/paperclip OpenClaw โ€” baseline Hybrid โ€” proposed

Orchestration Cost: LLM vs Deterministic

Symphony

0 tokens

Pure Elixir daemon. Poll โ†’ reconcile โ†’ dispatch is deterministic code. LLM tokens only spent by Codex doing actual work.

Paperclip

0 tokens

Node.js server + Postgres. Scheduling, budget enforcement, task checkout all code-based. Tokens only when agents execute tasks.

OpenClaw

~1.8M/mo

Every decision is an LLM call. System prompt (~10K tokens) loaded per heartbeat. 4 heartbeats/day โ‰ˆ 60K input tokens/day of pure overhead.

The tradeoff: Symphony and Paperclip can poll every 30 seconds for free. OpenClaw's value is judgment โ€” deciding what to work on, not just when. But for mechanical dispatch loops, deterministic code wins on cost and speed.

Real-World Token Waste: Symphony's Workpad Bug

George (@odysseus0z) found that Symphony's workpad edits send the full body on every update. On big tickets with >5K tokens, each checkbox toggle burns 5,000+ tokens. He submitted a PR for file-based workpad sync but can't PR directly to openai/symphony due to restrictions.

Source: x.com/odysseus0z โ€” Mar 12, 2026

Lesson: Even "zero-token orchestration" tools have hidden token costs in their agent protocol. The orchestrator is cheap but the agent-to-tracker communication can be wasteful. This is a design-level concern, not just a billing concern.


Symphony โ€” The Focused Scheduler

OpenAI's spec-driven orchestrator. Polls Linear, isolates work into per-issue workspaces, runs Codex autonomously.

openai/symphony SPEC v1 Draft Elixir/OTP Apache 2.0 Codex only Linear only

GitHub ยท SPEC.md ยท Elixir README ยท Ryan Carson's tweet

How It Works

Every polling.interval_ms (default 30s), this exact sequence runs:

Step 1
Reconcile
Check running agents against tracker state. Kill stale runs.
โ†’
Step 2
Validate
Pre-flight config check. If broken, skip dispatch but keep reconciling.
โ†’
Step 3
Fetch
Query Linear for active issues. Sort by priority โ†’ created_at โ†’ identifier.
โ†’
Step 4
Dispatch
Launch Codex in isolated workspaces until concurrency slots exhausted.

System Components

Workflow Loader

Reads WORKFLOW.md from your repo. Parses YAML front matter (config) + Markdown body (prompt template). Watches for changes โ€” hot-reloads without restart.

Config Layer

Typed getters for all workflow settings. Handles defaults, $ENV_VAR expansion, path normalization. Validates before every dispatch tick.

Issue Tracker Client

Linear-only. Fetches candidates in active states, refreshes states for reconciliation, fetches terminal issues for cleanup. Everything normalized into a stable domain model.

Orchestrator

The brain. Owns the poll tick, all in-memory state, dispatch/retry/stop decisions, token accounting. Single authority โ€” no external DB required.

Workspace Manager

Maps issue identifiers to filesystem paths. Creates directories, runs lifecycle hooks (after_create, before_run, after_run, before_remove). All paths validated inside workspace root.

Agent Runner

Creates workspace โ†’ builds prompt โ†’ launches codex app-server via JSON-RPC over stdio โ†’ streams events back. Supports multi-turn sessions on the same thread (up to max_turns).

WORKFLOW.md โ€” The Single Source of Truth

Agent behavior version-controlled with your code. Each repo can have different agent policies.

WORKFLOW.md โ€” Front Matter (Config)
--- tracker: kind: linear project_slug: "my-project" active_states: ["Todo", "In Progress"] terminal_states: ["Done", "Closed", "Cancelled"] polling: interval_ms: 30000 workspace: root: ~/code/workspaces hooks: after_create: | git clone --depth 1 git@github.com:org/repo.git . npm install before_run: | git checkout main && git pull agent: max_concurrent_agents: 5 max_turns: 20 codex: command: codex app-server approval_policy: never thread_sandbox: workspace-write stall_timeout_ms: 300000 ---
WORKFLOW.md โ€” Prompt Template (Markdown)
You are working on {{ issue.identifier }}. ## Task **Title:** {{ issue.title }} **Description:** {{ issue.description }} ## Instructions 1. Create a feature branch 2. Implement the changes 3. Write tests 4. Open a PR with a clear description 5. Move the issue to "Human Review" {% if attempt %} ## Retry Context This is attempt #{{ attempt }}. Check previous work and continue. {% endif %}
Key design: Liquid-compatible template with strict variable checking. The issue object includes all normalized fields (labels, blockers, priority). Unknown variables/filters fail the render โ€” no silent fallbacks.

Issue Orchestration States

Symphony's internal claim states โ€” separate from your Linear states (Todo, In Progress, etc.).

flowchart TD
    U["Unclaimed"]
    C["Claimed"]
    R["Running"]
    RQ["RetryQueued"]
    REL["Released"]

    U -->|"dispatch eligible"| C
    C -->|"worker launched"| R
    R -->|"normal exit"| RQ
    R -->|"abnormal exit"| RQ
    R -->|"terminal state"| REL
    R -->|"stall timeout"| RQ
    RQ -->|"timer + eligible"| R
    RQ -->|"timer + ineligible"| REL

    classDef unclaimed fill:#f0f0f0,stroke:#999,color:#333
    classDef claimed fill:#dbeafe,stroke:#3b82f6,color:#1e3a8a
    classDef running fill:#dcfce7,stroke:#22c55e,color:#166534
    classDef retry fill:#fef3c7,stroke:#f59e0b,color:#92400e
    classDef released fill:#f3f4f6,stroke:#9ca3af,color:#6b7280

    class U unclaimed
    class C claimed
    class R running
    class RQ retry
    class REL released
        

Retry Logic

Normal exit: 1-second continuation retry to re-check issue state.
Abnormal exit: Exponential backoff: min(10s ร— 2^(n-1), 5 min)
Before re-dispatch: Re-fetches candidates, checks eligibility + slots.

Reconciliation (every tick)

Stall detection: No Codex events for stall_timeout_ms โ†’ kill + retry.
State refresh: Fetch current state for all running issues. Terminal โ†’ kill + clean. Active โ†’ update snapshot. Unknown โ†’ kill, keep workspace.

What Makes It Interesting

Multi-Turn Sessions

After a turn completes, the worker checks if the issue is still active and starts another turn on the same thread (up to max_turns: 20). First turn gets full prompt, continuations get guidance only.

linear_graphql Tool

Client-side tool that lets Codex make raw GraphQL calls to Linear using Symphony's auth. The agent updates tickets, posts comments, manages state โ€” without needing its own API key.

Blocker Awareness

Issues in "Todo" state with non-terminal blockers are skipped. The orchestrator understands dependency chains before dispatching.

Safety Invariants

The spec's "most important portability constraint": agent only runs in per-issue workspace, paths validated inside root, workspace keys sanitized to [A-Za-z0-9._-] only.


Paperclip โ€” The Company Simulator

Open-source orchestration for "zero-human companies." If Symphony is a scheduler, Paperclip is an entire business operating system.

paperclipai/paperclip Node.js + React MIT Multi-agent Embedded Postgres

GitHub ยท paperclip.ing ยท Docs ยท George's tweet

How It Thinks

"If OpenClaw is an employee, Paperclip is the company." The key abstraction is a company with an org chart, not a project with tasks.

flowchart TD
    M["Company Mission\n'Build #1 AI note-taking app'"]
    PG["Project Goal\n'Ship collaboration features'"]
    AG["Agent Goal\n'Implement real-time sync'"]
    T["Task\n'Write WebSocket handler'"]

    M --> PG
    PG --> AG
    AG --> T

    CEO["CEO\n(Claude)"]
    CTO["CTO\n(Cursor)"]
    ENG1["Engineer\n(Codex)"]
    ENG2["Engineer\n(OpenClaw)"]

    CEO --> CTO
    CTO --> ENG1
    CTO --> ENG2

    T -.->|"assigned to"| ENG1

    classDef goal fill:#f3e8ff,stroke:#7e22ce,color:#581c87
    classDef agent fill:#faf5ff,stroke:#a855f7,color:#6b21a8
    classDef task fill:#fef3c7,stroke:#f59e0b,color:#92400e

    class M,PG,AG goal
    class CEO,CTO,ENG1,ENG2 agent
    class T task
        

Core Components

Org Chart Engine

Hierarchies, roles, reporting lines. Agents have a boss, title, and job description. Delegation flows up and down the chart. Hiring requires board approval.

Goal Alignment

Every task carries full goal ancestry: Task โ†’ Agent Goal โ†’ Project Goal โ†’ Company Mission. Agents always know what and why.

Budget System

Monthly budgets per agent. Atomic enforcement โ€” task checkout and budget check happen together. 80% soft warning. 100% = auto-pause. Board can override.

Ticket System

Every conversation traced, every decision explained. Full tool-call tracing + immutable audit log. Append-only history โ€” no edits, no deletions.

Heartbeat Scheduler

Agents wake on schedule, check work, act. Per-agent heartbeat intervals (e.g., Content Writer every 4h, SEO every 8h). Event triggers: task assignment, @-mentions.

Governance Layer

You're the board. Approve hires, override strategy, pause/terminate any agent. Approval gates enforced, config changes revisioned, bad changes rollback-safe.

Bring Your Own Agent

"If it can receive a heartbeat, it's hired." This is the key difference from Symphony.

6+
Agent Types
โˆž
Companies per Deploy
1
Command to Start

OpenClaw

Continuous agent with its own heartbeat. Hooks into Paperclip for task coordination.

Claude Code

Coding agent sessions. Paperclip manages task assignment and context.

Codex

OpenAI's coding agent. Same as Symphony uses, but orchestrated differently.

Cursor

Cloud-based IDE agent. On the roadmap for better support.

Bash / Scripts

Any shell command that can receive a heartbeat signal.

HTTP Webhooks

Any service with an HTTP endpoint. Maximum flexibility.

Under the Hood

Atomic Execution

Task checkout + budget enforcement happen atomically. No double-work, no runaway spend. If two agents try to grab the same task, only one wins.

Persistent Agent State

Agents resume the same task context across heartbeats. No "starting from scratch" each time โ€” the conversation thread, workspace state, and progress all persist.

Runtime Skill Injection

Agents learn Paperclip workflows and project context at runtime via SKILLS.md. They discover their own task context without being pre-trained on the system.

Portable Company Templates

Export/import entire org structures, agent configs, and skills. Secret scrubbing + collision handling built in. Clipmart (coming soon) = marketplace for pre-built companies.

Setup: npx paperclipai onboard --yes or clone + pnpm dev. Embedded PostgreSQL auto-created. API at localhost:3100. Requirements: Node.js 20+, pnpm 9.15+. Accessible via Tailscale for mobile management.

Three-Way Comparison

What Each Tool Believes

Symphony

"Issues in, PRs out."

Focused scheduler. Does one thing well: turn Linear issues into autonomous Codex runs. No opinions about org structure, budgets, or business goals. Just dispatch.

Paperclip

"Agents have jobs, not chat windows."

Company simulator. Models entire businesses with org charts, governance, goal hierarchies. Agents are employees with bosses, titles, and budgets. You're the board.

OpenClaw

"An assistant that thinks."

LLM-native orchestrator. Every decision involves judgment. Excels at ambiguous tasks, multi-channel communication, and being genuinely helpful. Not just a dispatcher.

Detailed Comparison

Capability Symphony Paperclip OpenClaw + Fizzy
Orchestration cost 0 tokens (deterministic) 0 tokens (deterministic) ~1.8M tokens/mo overhead
Agent support Codex only Any agent (OpenClaw, Claude, Codex, Cursor, Bash, HTTP) Any agent (Claude Code, Codex, Amp, etc.)
Issue tracker Linear only Built-in tickets (BYOT on roadmap) GitHub Issues + Fizzy cards
Workspace isolation Per-issue directories + hooks Task-level, implementation varies Sub-agent spawns, no worktrees
In-repo workflow policy WORKFLOW.md versioned with code SKILLS.md injection at runtime Config in OpenClaw, separate from repos
Reconciliation Every tick: stall + state refresh Heartbeat-based check-ins Fire-and-forget
Budget enforcement None Per-agent monthly budgets, atomic Reporting only (codexbar)
Goal hierarchy None (flat issues) Mission โ†’ Project โ†’ Agent โ†’ Task Informal (tags + conventions)
Org chart / roles None Full hierarchy with reporting lines AGENTS.md (informal)
Governance / approvals None (trusted environment) Board approval, hire gates, rollback Manual oversight via chat
Per-task cost tracking Per-session token counters Per-agent, per-task, per-project Aggregate via codexbar
Multi-turn sessions Same thread, up to max_turns Persistent across heartbeats Sub-agent sessions
Retry logic Exponential backoff + continuation Heartbeat-based re-check Basic re-run
Hot reload WORKFLOW.md file watch Config revisioned + rollback config.patch with restart
Dashboard TUI + Phoenix LiveView + JSON API React web UI + mobile-ready Telegram messages + codexbar
Messaging integration None None (ticket-based) Telegram, WhatsApp, Discord, etc.
Multi-company / multi-project Single project per instance Unlimited companies, full isolation Multi-agent, single workspace
Audit trail Structured logs Immutable append-only log Session logs (searchable)
LLM judgment None (mechanical dispatch) None (rule-based) Every decision involves reasoning
Setup Elixir + mise + Linear API key npx paperclipai onboard Already running

Three Different Approaches

flowchart TD
    subgraph S["Symphony"]
        direction TB
        SL["Linear\n(issue tracker)"]
        SO["Orchestrator\n(Elixir daemon)"]
        SW["Workspaces\n(per-issue dirs)"]
        SC["Codex\n(app-server)"]
        SL -->|"poll 30s"| SO
        SO -->|"dispatch"| SW
        SW -->|"launch"| SC
    end

    subgraph P["Paperclip"]
        direction TB
        PB["Board\n(you)"]
        PM["Company\n(org + goals)"]
        PT["Tickets\n(built-in)"]
        PA["Agents\n(any runtime)"]
        PB -->|"govern"| PM
        PM -->|"delegate"| PT
        PT -->|"heartbeat"| PA
    end

    subgraph O["OpenClaw"]
        direction TB
        OC["Chat\n(Telegram/etc)"]
        OA["LLM Agent\n(Opus/Sonnet)"]
        OF["Fizzy / GitHub\n(task source)"]
        OS["Sub-agents\n(any coding agent)"]
        OC -->|"request"| OA
        OA -->|"read"| OF
        OA -->|"spawn"| OS
    end

    classDef symphony fill:#dbeafe,stroke:#1e3a5f,color:#1e3a5f
    classDef paperclip fill:#f3e8ff,stroke:#7e22ce,color:#581c87
    classDef openclaw fill:#ccfbf1,stroke:#0d7377,color:#0d7377

    class SL,SO,SW,SC symphony
    class PB,PM,PT,PA paperclip
    class OC,OA,OF,OS openclaw
        

What's Worth Stealing

From Symphony

  • In-repo WORKFLOW.md โ€” per-repo agent behavior, version-controlled
  • Per-issue workspaces โ€” git worktrees, no cross-contamination
  • Reconciliation loop โ€” auto-kill stale/terminal runs every tick
  • Lifecycle hooks โ€” after_create, before_run, after_run
  • Blocker-aware dispatch โ€” skip issues with unresolved dependencies

From Paperclip

  • Budget enforcement โ€” atomic per-agent limits that actually stop runaway spend
  • Goal ancestry โ€” task context that traces back to "why"
  • Audit trail โ€” immutable, append-only decision logs
  • Multi-agent flexibility โ€” any agent that can receive a heartbeat

The Hybrid โ€” Proposed Architecture

Spend tokens on judgment, spend code on mechanics. Deterministic dispatch + LLM supervision + fizzy-pop coordination.

Fizzy as task source fizzy-pop for agent comms Zero-token dispatch OpenClaw for judgment Multi-agent

Three Layers, One System

The key insight: most orchestration is mechanical (poll, check, dispatch, retry). Only a small fraction needs actual reasoning. Separate them.

flowchart TD
    subgraph L1["Layer 1: Deterministic Dispatcher (0 tokens)"]
        direction LR
        FZ["Fizzy Cards\n(bot-actionable)"]
        DP["Dispatcher\n(cron script)"]
        WS["Workspace\nManager"]
        FZ -->|"poll every 5 min"| DP
        DP -->|"create worktree"| WS
    end

    subgraph L2["Layer 2: Agent Execution (work tokens)"]
        direction LR
        CX["Codex"]
        CC["Claude Code"]
        AMP["Amp"]
        OC["OpenCode"]
    end

    subgraph L3["Layer 3: fizzy-pop + OpenClaw (judgment tokens)"]
        direction LR
        FP["fizzy-pop\n(notifications)"]
        OP["OpenClaw\n(Optimus)"]
        ZN["Zain\n(human)"]
        FP -->|"mentions, assigns"| OP
        OP -->|"escalate"| ZN
    end

    DP -->|"route + launch"| L2
    L2 -->|"done / blocked / failed"| FP
    OP -->|"review PRs\njudgment calls"| L2

    classDef layer1 fill:#fef3c7,stroke:#b8860b,color:#78350f
    classDef layer2 fill:#dbeafe,stroke:#1e3a5f,color:#1e3a5f
    classDef layer3 fill:#dcfce7,stroke:#166534,color:#166534

    class FZ,DP,WS layer1
    class CX,CC,AMP,OC layer2
    class FP,OP,ZN layer3
        

What Each Layer Does

Layer 1: Deterministic Dispatcher 0 tokens

A lightweight script on a cron. Does everything Symphony's orchestrator does, but against Fizzy instead of Linear.

Every tick (5 min):

  • Poll Fizzy for bot-actionable cards
  • Resolve blocked-by:NN tags โ€” skip blocked cards
  • Check concurrency limits (max N agents running)
  • Check per-agent budget caps
  • Create per-card git worktree
  • Run before_run hook (git pull, deps)
  • Route card to agent (tag-based rules)
  • Launch agent in workspace

Reconciliation (every tick):

  • Check card state โ€” if Done/Closed, kill running agent
  • Stall detection โ€” no output for N minutes โ†’ kill + retry
  • Track per-card token spend from agent output
  • Run after_run hook on completion
  • On failure: exponential backoff retry
  • On success: comment on Fizzy card with results
  • Clean up worktrees for completed/terminal cards

State stored in memory/dispatcher-state.json. No database. Restart recovery by re-scanning Fizzy + filesystem.

Layer 2: Agent Execution work tokens (unchanged)

The actual coding agents do the work. Nothing changes here โ€” they just get launched by the dispatcher instead of by Optimus.

Agent routing (tag-based):

  • code โ†’ Codex or Claude Code
  • enhancement โ†’ Claude Code
  • chore โ†’ Amp (free credits)
  • research โ†’ Perceptor
  • bug โ†’ Codex (fast iteration)

Per-repo customization:

  • Optional WORKFLOW.md in repo root
  • Defines prompt template, hooks, agent prefs
  • Falls back to default prompt if absent
  • Version-controlled with the code

Layer 3: fizzy-pop + OpenClaw Judgment judgment tokens only

fizzy-pop is already our inter-agent communication layer. Bots assign cards, mention each other, and coordinate via Fizzy notifications. OpenClaw steps in only when thinking is required.

fizzy-pop handles (0 tokens):

  • Card assignment notifications between agents
  • @-mentions for cross-agent handoffs
  • Status updates via card comments
  • Column transitions (Ready โ†’ In Progress โ†’ Done)
  • The entire "who should work on this" coordination

OpenClaw handles (judgment tokens):

  • PR review โ€” "is this code good enough?"
  • Ambiguity resolution โ€” "this card is underspecified"
  • Escalation to Zain โ€” needs-human cards
  • Cross-agent conflict resolution
  • Dispatcher health monitoring
  • Periodic audits โ€” "are we working on the right things?"

Estimated Token Savings

Activity Today (all LLM) Hybrid Savings
Card polling + triage ~15K tokens/heartbeat 0 (dispatcher code) 100%
Dependency checking N/A (not implemented) 0 (blocked-by:NN tags) New capability, free
Agent spawning + routing ~5K tokens/dispatch 0 (tag-based rules) 100%
Reconciliation N/A (fire-and-forget) 0 (dispatcher code) New capability, free
Inter-agent coordination ~5K tokens (OpenClaw relays) 0 (fizzy-pop notifications) 100%
PR review + judgment ~10K tokens (when needed) ~10K tokens (unchanged) 0% (intentional)
Heartbeat total ~60K tokens/day ~15K tokens/day ~75% reduction
Net result: ~75% fewer orchestration tokens while gaining dependency tracking, workspace isolation, reconciliation, retry logic, and budget enforcement. The tokens we do spend go to judgment โ€” the one thing an LLM is actually better at than code.

What We Keep vs What We Build

Already exists (keep as-is)

  • All agent configs and personalities
  • Fizzy as task source + boards
  • fizzy-pop for agent notifications, mentions, assignments
  • OpenClaw for messaging (Telegram, WhatsApp, etc.)
  • Memory system (MEMORY.md, knowledge/)
  • Codexbar for cost tracking
  • gh-issues + amp-swarm skills
  • All coding agents (Codex, Claude Code, Amp, etc.)

New (to build)

  • Dispatcher script โ€” polls Fizzy, manages concurrency, launches agents
  • Workspace manager โ€” git worktree per card, lifecycle hooks
  • blocked-by:NN tag checker โ€” Card #187 convention
  • Tag-based router โ€” card tags โ†’ agent selection
  • Reconciliation loop โ€” stall detection, state refresh, cleanup
  • Budget caps โ€” per-card / per-agent token limits
  • WORKFLOW.md convention for repos (optional)

The Dependency Gap

Fizzy Has No Native Dependencies

Fizzy cards can't natively block other cards. Symphony skips issues with unresolved blockers automatically. Linear and Paperclip both support dependency chains.

We explored this in Fizzy Card #187 (discussed with Wheeljack). Two options:

Option 1: Fork Fizzy

~5-8 days effort. Native dependency support. But maintaining a fork = drift risk. Fighting against the current. Not recommended.

Option 2: Tag Convention โœ“

blocked-by:NN tags. Dispatcher checks if card #NN is still open before dispatch. Bot heartbeat checks for stale blockers. Zero code changes to Fizzy.

Linear solves this at the tracker level, but if its agent protocol wastes 5K+ tokens per checkbox toggle, the savings from deterministic orchestration get eaten right back. The pragmatic path: implement the tag convention rather than switching trackers.