perf sentineldocs
FRGitHub
Documentation / Architecture

Architecture

perf-sentinel is a polyglot performance anti-pattern detector built as a Rust workspace with two crates:

  • sentinel-core : library containing all pipeline logic
  • sentinel-cli : binary providing the CLI entry point (perf-sentinel)

Glossary

A few terms recur across the docs and code; the distinctions matter:

  • Event — a single normalized SpanEvent. The pre-detection unit (input).
  • Finding — a detected anti-pattern (output of detect). The product-level concept exposed in CLI output, JSON, SARIF, HTML, the daemon API and acks.
  • Pattern — the normalized SQL/HTTP template that groups events for a finding (e.g. SELECT * FROM users WHERE id = ?).
  • Detection — the act, or the family of detectors (n_plus_one, chatty, ...). Not a synonym of finding.

Operating modes:

  • Batch mode (perf-sentinel analyze) — processes a complete trace set and produces a single Report. Used in CI integration tests, ad-hoc analysis, and perf-sentinel report (HTML).
  • CI mode (perf-sentinel analyze --ci) — same as batch, but the quality gate failure exits with code 1. The Confidence field on each finding is stamped ci_batch (lowest confidence: limited traffic shapes).
  • Daemon mode / streaming mode / watch mode (perf-sentinel watch) — long-running process ingesting OTLP/JSON events in real time, with TTL-based eviction and per-trace detection. Findings are stamped daemon_staging or daemon_production based on [daemon] daemon_environment (boosts severity in downstream consumers like perf-lint).

The Confidence axis (ci_batch < daemon_staging < daemon_production) is stamped automatically by the runtime and exposed in JSON, SARIF, HTML and the daemon query API.

Pipeline overview

Operating modes

Batch mode (perf-sentinel analyze)

Processes a complete set of events and produces a single report with quality gate evaluation.

Vec<SpanEvent>
  -> normalize::normalize_all()        -> Vec<NormalizedEvent>
  -> correlate::correlate()            -> Vec<Trace>
  -> detect::detect()                  -> Vec<Finding>
  -> score::score_green()              -> (Vec<Finding>, GreenSummary)
  -> quality_gate::evaluate()          -> QualityGate
  -> Report { analysis, findings, green_summary, quality_gate }

In CI mode (--ci), the process exits with code 1 if the quality gate fails.

Streaming mode (perf-sentinel watch)

Runs as a daemon, receiving events in real time and emitting findings as they are detected.

OTLP gRPC (port 4317)  \
OTLP HTTP (port 4318)   +---> mpsc channel ---> TraceWindow (LRU + TTL)
JSON unix socket       /                               |
                                              +--------+--------+
                                              |                 |
                                        LRU eviction    TTL eviction
                                              |                 |
                                              +--------+--------+
                                                       |
                                          normalize -> detect -> score
                                                       |
                                              NDJSON findings (stdout)
                                              Prometheus /metrics
  • Events are normalized outside the TraceWindow lock to minimize lock hold time.
  • Traces are evicted when the LRU cache is full (max_active_traces) or when TTL expires (trace_ttl_ms).
  • On eviction, the trace is analyzed through detect and score stages.
  • Findings are emitted as newline-delimited JSON to stdout.
  • Prometheus metrics are exposed on the same HTTP port (4318) at /metrics.

Daemon query API

In watch mode, the daemon exposes its internal state via HTTP endpoints on port 4318 alongside /v1/traces and /metrics:

  • GET /api/findings (filterable, capped at 1000)
  • GET /api/findings/{trace_id}
  • GET /api/explain/{trace_id} (tree with findings inline, from daemon memory)
  • GET /api/correlations (cross-trace correlations, capped at 1000)
  • GET /api/status (uptime, active traces, stored findings count)

A FindingsStore ring buffer retains recent findings for the API (sized by [daemon] max_retained_findings, default 10k). The companion CLI subcommand perf-sentinel query renders these endpoints in colored terminal output. Gated by [daemon] api_enabled (default true); see Limitations for the no-auth threat model.

Module responsibilities

ModulePathResponsibility
eventevent.rsCore SpanEvent type (SQL and HTTP variants) with timestamp, trace/span IDs, service, operation, target, duration
ingestingest/Input sources: JSON parser with auto-format detection (json.rs), Jaeger JSON import (jaeger.rs), Zipkin JSON v2 import (zipkin.rs), OTLP gRPC+HTTP receiver (otlp.rs). Implements IngestSource trait. PostgreSQL pg_stat_statements parser (pg_stat.rs) for hotspot analysis. Remote trace fetchers for Grafana Tempo (tempo.rs) and the Jaeger query API (jaeger_query.rs), with shared helpers for auth-header handling (auth_header.rs), lookback windows (lookback.rs) and URL encoding (url_enc.rs)
normalizenormalize/Produces NormalizedEvent with template + extracted params. SQL tokenizer (sql.rs): replaces literals, UUIDs, IN lists. HTTP normalizer (http.rs): replaces numeric/UUID path segments, strips query params
correlatecorrelate/Groups events by trace_id. Batch mode (mod.rs): HashMap aggregation. Streaming mode (window.rs): LRU cache with per-trace ring buffer and TTL eviction
detectdetect/Pattern detection on correlated traces. N+1 (n_plus_one.rs): same template, different params, within window. Redundant (redundant.rs): same template and params. Slow (slow.rs): duration above threshold with recurring template. Fanout (fanout.rs): parent span with excessive children. Chatty service (chatty.rs): too many HTTP outbound calls per trace. Pool saturation (pool_saturation.rs): concurrent SQL spans exceeding threshold via sweep-line algorithm. Serialized calls (serialized.rs): sequential independent sibling calls that could be parallelized. Cross-trace correlator for daemon mode (correlate_cross.rs). Remediation hints attached to findings (suggestions.rs). Sanitizer-aware classification (sanitizer_aware.rs): recognizes already-parameterized SQL so N+1 detection can suppress safe cases (SQL only today)
scorescore/GreenOps scoring (mod.rs): IIS per endpoint, waste ratio, top offenders, green_impact per finding. Carbon pipeline split across carbon.rs (embedded grid intensity table, SCI constants), carbon_compute.rs (per-span accumulation loop), carbon_profiles.rs (hourly grid profiles), region_breakdown.rs (region folding, model-tag selection, CarbonReport finalization). Five opt-in power and energy backends, each a structured submodule: scaphandre/ and kepler/ (RAPL), redfish/ (BMC telemetry), cloud_energy/ (CPU% + embedded SPECpower interpolation), electricity_maps/ (live grid intensity, only when configured). Shared scraper plumbing: energy_state.rs (per-service coefficient cache) and ops_snapshot_diff.rs (per-service op deltas per scrape window). Fixed-threshold avoidable computation for disclosure anti-gaming (canonical.rs). Multi-region SCI v1.0 via OTel cloud.region attribute, confidence intervals, hourly profiles for FR/DE/GB/US-East
reportreport/Output formatting. JSON report (json.rs), SARIF v2.1.0 export (sarif.rs), colored CLI output (mod.rs), Prometheus metrics with OpenMetrics exemplars (metrics.rs), single-file HTML dashboard (html.rs + bundled html_template.html) with Findings, Explain, pg_stat, Diff, Correlations and Green tabs. Plain-language finding interpretations (interpret.rs), report-level warnings (warnings.rs), and the periodic public-disclosure stack (periodic/: schema, aggregator, validator, org config, content hasher, attestation)
quality_gatequality_gate.rsEvaluates configurable threshold rules against findings and green summary
pipelinepipeline.rsWires all stages together for batch mode: normalize -> correlate -> detect -> score -> quality_gate -> Report
daemondaemon/Streaming mode: mod.rs hosts DaemonError and the run() orchestrator. Responsibilities split across event_loop.rs (tokio::select loop, TraceWindow eviction, single analysis worker running detect/score off the loop), listeners.rs (OTLP gRPC/HTTP spawn, optional energy scrapers), tls.rs (cert loading, MaybeTlsStream, HTTPS serve loop with handshake-concurrency cap), json_socket.rs (Unix NDJSON ingestion, cfg(unix)), sampling.rs (trace-id hash sampling), findings_store.rs + query_api.rs (retained findings and HTTP query endpoints), ack.rs (acknowledgment store and HTTP ack endpoints), health.rs (liveness and readiness), archive.rs (per-window NDJSON Report archive consumed by disclose)
configconfig.rsParses .perf-sentinel.toml with the sectioned format ([thresholds], [detection], [green], [daemon]). The 8 legacy top-level keys accepted by 0.5.x were removed in 0.6.0 (see Configuration for the migration table)
timetime.rsShared timestamp conversion helpers (nanos_to_iso8601, micros_to_iso8601). Used by OTLP, Jaeger and Zipkin ingestion
explainexplain.rsTrace tree viewer: builds span tree from parent_span_id, annotates findings inline (suggestion, framework-specific fix, code location). Text and JSON output
diffdiff.rsRegression delta between two trace sets. Backs the diff subcommand used in PR CI
calibratecalibrate.rsTunes I/O-to-energy coefficients from baseline traces plus a measured power CSV. Backs the calibrate subcommand
acknowledgmentsacknowledgments.rs.perf-sentinel-acknowledgments.toml filtering for CI, with a sha2 canonical signature per finding
http_clienthttp_client.rsThe single TLS-capable outbound HTTP client (body-size limits, endpoint redaction) shared by the energy scrapers, Tempo and Jaeger ingestion, and query
shutdownshutdown.rsShutdown-signal future (SIGINT everywhere, SIGTERM on Unix) shared by the daemon event loop and the Tempo fetch loop
text_safetytext_safety.rsPure helpers shared by every text renderer that prints attacker-controlled strings to a terminal: sanitize_for_terminal strips ASCII control characters to neutralise ANSI / OSC 8 / cursor injection; safe_url gates suggested_fix.reference_url to HTTPS without control chars

Key types

TypeModuleDescription
SpanEventeventRaw I/O event (SQL query or HTTP call) with metadata and optional parent_span_id
NormalizedEventnormalizeSpanEvent enriched with normalized template and extracted parameters
TracecorrelateCollection of NormalizedEvents sharing the same trace_id
FindingdetectDetected anti-pattern with type, severity, pattern details, timestamps, green_impact and confidence (CI batch / daemon staging / daemon production)
GreenSummaryscoreAggregate I/O stats: total ops, avoidable ops, waste ratio, top offenders, optional CO2
QualityGatequality_gatePass/fail result with individual rule evaluations
ReportreportComplete analysis output: analysis metadata, findings, green summary, quality gate
ConfigconfigParsed configuration with all sections and validated fields
TraceWindowcorrelate/windowLRU cache of active traces for streaming mode with TTL eviction

Crate boundaries

sentinel-cli (binary)
  |
  +-- clap CLI: analyze / explain / diff / report / watch / query / ack / inspect / tempo / jaeger-query / pg-stat / calibrate / disclose / verify-hash / hash-bake / demo / bench / man / completions
  |
  +-- depends on sentinel-core (library)
        |
        +-- All pipeline logic
        +-- Traits at borders only: IngestSource, ReportSink
        +-- Pure functions between stages

The CLI crate is intentionally thin: it parses arguments, loads config and delegates to sentinel-core functions. All business logic lives in sentinel-core.

Cargo features

Whole subcommands are compile-time gated by Cargo features:

FeatureDefined inDefaultGates
daemoncore and clion in cli, off in corewatch, query, ack, the daemon event loop and the energy scrapers
tuicli onlyoninspect and the interactive --tui previews
tempocore and clion in cli, off in corethe tempo subcommand (Grafana Tempo API ingestion)
jaeger-querycore and clion in cli, off in corethe jaeger-query subcommand (Jaeger query API ingestion)

The published binary enables all four. Library consumers of sentinel-core start with everything off and opt in. Note that tonic, axum, tower, prometheus and opentelemetry-proto remain unconditional dependencies of sentinel-core because report/metrics.rs and ingest/otlp.rs use their types even when no listener runs.