Storage & Blobs
Tael's default storage engine (tael-backend) is a purpose-built tiered store designed for the shape of agent telemetry: a steady stream of wide spans, large LLM payloads, and queries that span both "the last five minutes" and "this trace from last week."
The tiers
- Name
Write-ahead log (WAL)- Description
Every incoming record is appended durably and acked to the client immediately. Routed by signal tag (span / log / metric). This is the crash-recovery boundary.
- Name
Hot tier (LSM)- Description
Roughly the last 24h of data in a log-structured merge tree, keyed per signal — spans by
(trace_id, span_id), logs by(service, ts), metrics by(name, labels, ts). Fast for recent queries.
- Name
Cold tier (Parquet)- Description
Aged data rolls off to immutable columnar Parquet files on disk, partitioned by
tenant/date/hourand sorted per signal. Cheap to keep, efficient to scan.
- Name
Query engine (DataFusion)- Description
Unifies the hot and cold tiers behind one query surface, so a single
tael queryortael query sqltransparently reads from both.
A background compactor (hourly by default) rolls aged hot data into Parquet, and a retention enforcer drops partitions and garbage-collects unreferenced blobs past their window. All of this is tunable — see Configuration.
Content-addressed blobs
LLM prompts, completions, and oversized log bodies are not stored inline in the columnar tables. They're hashed with SHA-256, stored once in a blob store, and referenced by hash (prompt_sha256, completion_sha256) on the span. This buys three things:
- Free deduplication. The same 4k-token system prompt sent on every request is stored exactly once.
- Clean columnar compression. No giant text columns bloating the span table.
- Decoupled retention. Keep a year of span metadata while expiring the heavy payloads after 30 days — they're on independent clocks.
Blobs are snappy-compressed and GC'd when no live row references them.
Full-text search
A Tantivy index covers prompt/completion payloads and log bodies, so you can search inside the content, not just the metadata:
tael query traces --text "rate limit" --last 24hThis is tael-backend-only (see below).
Storage backends
- Name
tael-backend- Type
- default
- Description
The tiered engine described above (WAL + hot LSM + cold Parquet + blobs + full-text). Use this unless you have a reason not to.
- Name
duckdb- Type
- legacy
- Description
An embedded DuckDB backend, single-writer, simpler. Select it with
tael serve --storage duckdborTAEL_STORAGE=duckdb. Full-text--textsearch is not available on this backend.
Related
- Configuration —
TAEL_DATA_DIR,TAEL_HOT_TIER_HOURS,TAEL_COMPACT_INTERVAL_SECS,TAEL_TRACE_RETENTION_DAYS, and more. - Data Model — the records these tiers store.
