Storage & Blobs

Tael's default storage engine (tael-backend) is a purpose-built tiered store designed for the shape of agent telemetry: a steady stream of wide spans, large LLM payloads, and queries that span both "the last five minutes" and "this trace from last week."

The tiers

  • Name
    Write-ahead log (WAL)
    Description

    Every incoming record is appended durably and acked to the client immediately. Routed by signal tag (span / log / metric). This is the crash-recovery boundary.

  • Name
    Hot tier (LSM)
    Description

    Roughly the last 24h of data in a log-structured merge tree, keyed per signal — spans by (trace_id, span_id), logs by (service, ts), metrics by (name, labels, ts). Fast for recent queries.

  • Name
    Cold tier (Parquet)
    Description

    Aged data rolls off to immutable columnar Parquet files on disk, partitioned by tenant/date/hour and sorted per signal. Cheap to keep, efficient to scan.

  • Name
    Query engine (DataFusion)
    Description

    Unifies the hot and cold tiers behind one query surface, so a single tael query or tael query sql transparently reads from both.

A background compactor (hourly by default) rolls aged hot data into Parquet, and a retention enforcer drops partitions and garbage-collects unreferenced blobs past their window. All of this is tunable — see Configuration.

Content-addressed blobs

LLM prompts, completions, and oversized log bodies are not stored inline in the columnar tables. They're hashed with SHA-256, stored once in a blob store, and referenced by hash (prompt_sha256, completion_sha256) on the span. This buys three things:

  • Free deduplication. The same 4k-token system prompt sent on every request is stored exactly once.
  • Clean columnar compression. No giant text columns bloating the span table.
  • Decoupled retention. Keep a year of span metadata while expiring the heavy payloads after 30 days — they're on independent clocks.

Blobs are snappy-compressed and GC'd when no live row references them.

A Tantivy index covers prompt/completion payloads and log bodies, so you can search inside the content, not just the metadata:

tael query traces --text "rate limit" --last 24h

This is tael-backend-only (see below).

Storage backends

  • Name
    tael-backend
    Type
    default
    Description

    The tiered engine described above (WAL + hot LSM + cold Parquet + blobs + full-text). Use this unless you have a reason not to.

  • Name
    duckdb
    Type
    legacy
    Description

    An embedded DuckDB backend, single-writer, simpler. Select it with tael serve --storage duckdb or TAEL_STORAGE=duckdb. Full-text --text search is not available on this backend.

  • ConfigurationTAEL_DATA_DIR, TAEL_HOT_TIER_HOURS, TAEL_COMPACT_INTERVAL_SECS, TAEL_TRACE_RETENTION_DAYS, and more.
  • Data Model — the records these tiers store.

Was this page helpful?