Trace Comments

A trace comment is a persistent annotation attached to a trace — or to a specific span within it — that survives across sessions. It's how an agent records why a trace failed and what it did about it, so the next session (human or agent) starts from the finding instead of re-deriving it.

Adding and reading comments

# Plain finding on a trace
tael comment add abc123def456 "Root cause: expired DB connection pool" --author oncall-bot

# Scoped to a single span
tael comment add abc123def456 "This query needs an index" --span-id xyz789 --author claude

# List everything on a trace
tael comment list abc123def456

--author defaults to cli if you don't pass it. Inside tael live, the c key adds a comment to the trace you're viewing — the same record.

Why comments matter for agents

LLM agents are stateless across runs. Without somewhere to write findings, every investigation starts from zero. Trace comments give an agent a durable place to leave notes attached to the exact evidence:

  • Root-cause findings — "expired connection pool", "rate limited by upstream".
  • Recovery notes — when an agent recovers from an error, it can record what it tried and why it worked.
  • Audit trail — a reviewable history of who concluded what, attached to the trace itself.

Because comments live on the trace, anything that later pulls that trace — tael get trace, tael correlate, the TUI, another agent — sees the annotations too.

Structured comments

A comment body can be free text, or structured JSON that encodes Tael's failure taxonomy. This is what powers the issue / signal / experiment lifecycle: a one-off stumble can be promoted into a recurring issue, tracked as a long-running signal, and validated with a production experiment.

A failure-review comment looks like:

{
  "kind": "failure_review",
  "status": "stumble",
  "failure_mode": "tool_error",
  "impact": "high",
  "last_successful_step": "fetched account balance",
  "first_failure": "refund tool returned 403",
  "summary": "refund tool permission failure loops",
  "representative": true
}

An untrusted agent self-report (lower-trust, kept separate) uses:

{
  "kind": "self_diagnostic",
  "category": "broken_tool",
  "severity": "high",
  "confidence": "medium",
  "summary": "flaky_api times out roughly 1 in 5 calls"
}

Was this page helpful?