Wide Events

The single most important practice for getting value out of Tael is the wide event: treat each span as one rich record of a unit of work, and attach every fact you might want later as a span attribute. Get this right and the rest of Tael — querying, correlation, anomaly detection — falls out for free.

The idea

A traditional app scatters its state across dozens of narrow log lines ("user authenticated", "cache miss", "retrying") and a handful of pre-aggregated metrics. To reconstruct what happened, you grep logs, eyeball timestamps, and guess at causality.

A wide event inverts that. One span per unit of work carries everything: the IDs, the counts, the feature flags, which code path was taken, the duration of sub-steps, the error details, the version, the git SHA. When something goes wrong, the failing span already has the full context — no correlation by timestamp required.

with tracer.start_as_current_span("checkout") as span:
    span.set_attribute("user.id", user_id)
    span.set_attribute("cart.item_count", len(cart))
    span.set_attribute("cart.total_usd", total)
    span.set_attribute("flag.new_pricing", flags["new_pricing"])
    span.set_attribute("payment.provider", provider)
    span.set_attribute("code.git_sha", GIT_SHA)
    # ...do the work, attaching results as you go...
    span.set_attribute("payment.latency_ms", payment_ms)

Rules of thumb

One span per unit of work. One request handler, one job, one CLI command, one agent step.
Attach every useful fact as an attribute. IDs, counts, chosen branches, flags, sub-step timings, versions. Disk is cheap; a missing attribute during an incident is not.
On error, keep going. Set status = error, record the exception — but attach all the context you have before returning. A failed span with 50 attributes is worth far more than a stack trace alone.
Use span duration for timing. Don't emit a separate latency metric for something a span already measures.
Child spans are for sub-operations with real duration. A DB query, an HTTP call, a cache lookup. Auto-instrumentation creates most of these for you; they inherit trace_id automatically.

Anti-patterns

A log line per variable. If you find yourself logging "count = 7", that's a span attribute, not a log.
A span per loop iteration. Spans are units of work, not print statements. Aggregate inside one span instead.
Stripping "noisy" attributes. The attribute you delete to reduce noise is the one you'll want at 3am. Keep it.

Why it pays off

Because the facts live on the span as structured fields, every Tael query can use them:

# Filter on any attribute you attached
tael query traces --attribute flag.new_pricing=true --status error --last 1h

# Or reach for SQL when you need aggregation
tael query sql "SELECT attributes->>'payment.provider' AS provider, COUNT(*) AS errors
                FROM spans WHERE status='error' GROUP BY provider ORDER BY errors DESC"

The wide-events discipline is what turns "we have traces" into "an agent can answer questions about production."

Data Model — the fields available on every span.
Instrumenting Your App — wiring the OTel SDK to emit these spans.
Debugging with an Agent — how an agent uses wide events to investigate.

Wide Events

The idea

Rules of thumb

Anti-patterns

Why it pays off

Related