Daemon query API
The perf-sentinel daemon exposes an HTTP query API that lets external systems pull findings, trace explanations, cross-trace correlations, and daemon liveness. Use it to feed Prometheus alerts, Grafana dashboards, on-call runbooks or custom CI gate scripts without parsing NDJSON logs.
The API shipped in v0.4.0. This page documents it as a first-class product surface with a stability contract.
Contents
- Endpoint overview: one-line description per endpoint.
- Restricting writes in production: reserve acks and report export to a group with a reverse proxy.
- Endpoints: full per-endpoint reference with request, response and worked examples.
- Error responses: status codes and body shapes.
- Use cases: Prometheus alerting, custom Grafana panels, SRE runbooks.
- Stability contract: the v0.4.1+ stability guarantees.
- See also: cross-references to related docs.
Endpoint overview
| Method | Path | Purpose |
|---|---|---|
| GET | /api/status | Daemon liveness, version, uptime, in-flight counts |
| GET | /api/config | Effective [daemon] configuration, read-only, secrets summarized (since 0.8.8) |
| GET | /api/energy | Live health of the energy/intensity backends (since 0.8.8) |
| GET | /api/findings | Recent findings from the ring buffer, with service, type and severity filters |
| GET | /api/findings/{trace_id} | All findings for one trace |
| GET | /api/explain/{trace_id} | Span tree for a trace still in daemon memory, findings annotated inline |
| GET | /api/correlations | Active cross-trace temporal correlations |
| GET | /api/export/report | Snapshot the live state as a Report JSON, pipe-compatible with report --input - |
| POST | /api/findings/{signature}/ack | Acknowledge a finding at runtime (since 0.5.20) |
| DELETE | /api/findings/{signature}/ack | Revoke a runtime ack |
| GET | /api/acks | List active runtime acks |
All endpoints return application/json. No built-in authentication. The daemon listens on 127.0.0.1 by default (see [daemon] listen_address in Configuration), so the API is reachable only from the host running the daemon unless you explicitly widen the bind address. To let developers read findings while reserving writes (acks) and the official report export to architects or DevOps, see Restricting writes in production.
Deployment notes
- The query API shares the same HTTP port as OTLP HTTP ingestion (
[daemon] listen_port_http, default4318), the/metricsPrometheus scrape endpoint and theGET /healthliveness probe. One port, four surfaces. - The query API can be disabled at startup by setting
[daemon] api_enabled = false. Useful when the daemon runs in a hostile multi-tenant host and you only want OTLP ingestion. In that mode,/metricsand/healthstay exposed, they are infrastructure surfaces, not part of the query API. - For Kubernetes or load-balancer probes, prefer
GET /healthoverGET /api/status:/healthis always on, holds no locks and stays responsive under any ingestion load. - The findings ring buffer (a fixed-size circular store that evicts oldest entries when full) is bounded by
[daemon] max_retained_findings(default10000). Older findings are evicted FIFO.
Restricting writes in production (reverse proxy)
A common production requirement is to let any developer read findings while reserving the write paths (acknowledge and revoke) and the official report export to architects or DevOps. This stops a finding from being acked without sign-off from the people accountable for the production posture.
The daemon does not carry an identity provider or a role model. The optional [daemon.ack] api_key (see POST /api/findings/{signature}/ack) is a single shared secret: it gates writes coarsely, but it cannot tell one user from another and cannot express "this group may, that group may not". For per-identity authorization, put a reverse proxy in front of the daemon. The proxy authenticates every caller against your SSO, then authorizes by HTTP method and path. The daemon stays a pure analysis engine, which matches its design (no implicit network surface, no embedded IAM).
The rule the proxy enforces:
| Path | GET | POST / DELETE |
|---|---|---|
/api/findings, /api/explain/..., /api/correlations, /api/status, /api/config, /api/energy, /api/acks | any authenticated user | not applicable |
/api/findings/{signature}/ack | not applicable | privileged group only |
/api/export/report | privileged group only | not applicable |
/api/export/report sits in the privileged column because it materializes the full report snapshot that feeds the official HTML dashboard. Producing an official report is itself a privileged action, see Reporting for the CI-side counterpart (who may run disclose --intent official).
oauth2-proxy + nginx
oauth2-proxy handles the OIDC authentication and surfaces the authenticated identity as response headers. Its /oauth2/auth endpoint also enforces group membership per request through the allowed_groups query parameter, so the authorization decision is made by oauth2-proxy, not by fragile nginx if logic. nginx routes privileged paths to a group-checked auth subrequest and everything else to a plain one.
oauth2-proxy.cfg (auth-only mode, nginx does the proxying):
provider = "oidc"
oidc_issuer_url = "https://sso.example.com/realms/prod"
client_id = "perf-sentinel"
client_secret = "${OAUTH2_PROXY_CLIENT_SECRET}" # from your secret manager, never committed
cookie_secret = "${OAUTH2_PROXY_COOKIE_SECRET}" # 32-byte base64
email_domains = ["example.com"]
upstreams = ["static://202"] # auth-only: return 202 on success, nginx proxies the daemon
reverse_proxy = true
set_xauthrequest = true # emit X-Auth-Request-User / -Email / -Groups
oidc_groups_claim = "groups" # so the group claim reaches nginx
scope = "openid email groups"nginx.conf (relevant server block):
upstream perf_sentinel { server 127.0.0.1:4318; } # daemon, loopback-only
upstream oauth2_proxy { server 127.0.0.1:4180; }
server {
listen 443 ssl;
server_name perf-sentinel.internal;
# ssl_certificate / ssl_certificate_key ...
# oauth2-proxy sign-in and callback routes.
location /oauth2/ {
proxy_pass http://oauth2_proxy;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_set_header X-Auth-Request-Redirect $request_uri;
}
# Plain authentication: any valid SSO session.
location = /oauth2/auth {
internal;
proxy_pass http://oauth2_proxy;
proxy_pass_request_body off;
proxy_set_header Content-Length "";
proxy_set_header X-Original-URI $request_uri;
}
# Group-checked authentication: oauth2-proxy returns 403 when the
# caller is not in the group, which auth_request propagates as 403.
location = /oauth2/auth-admin {
internal;
proxy_pass http://oauth2_proxy/oauth2/auth?allowed_groups=perf-sentinel-admins;
proxy_pass_request_body off;
proxy_set_header Content-Length "";
proxy_set_header X-Original-URI $request_uri;
}
# Privileged routes: ack create/revoke and the official report export.
# A regex location wins over the /api/ prefix, so these never fall
# through to the open rule below.
location ~ ^/api/(findings/[^/]+/ack|export/report)$ {
auth_request /oauth2/auth-admin;
error_page 401 = /oauth2/sign_in;
auth_request_set $auth_user $upstream_http_x_auth_request_user;
proxy_set_header X-User-Id $auth_user; # overwrites any client-supplied value
proxy_pass http://perf_sentinel;
proxy_set_header Host $host;
}
# Everything else under /api/: read access for any authenticated user.
location /api/ {
auth_request /oauth2/auth;
error_page 401 = /oauth2/sign_in;
auth_request_set $auth_user $upstream_http_x_auth_request_user;
proxy_set_header X-User-Id $auth_user;
proxy_pass http://perf_sentinel;
proxy_set_header Host $host;
}
}Why this is safe
- Bind the daemon to loopback (
[daemon] listen_address = "127.0.0.1") or an internal interface the proxy alone can reach. The proxy is the only front door. - Keep
[daemon.ack] api_keyset as a second factor. If someone reaches the daemon port directly, bypassing the proxy, they still cannot write without the key. - The daemon trusts
X-User-Idfor the auditbyfield. The nginx block sets it from the authenticated subrequest ($auth_user) and so overwrites any value a client supplies, which closes the spoofing gap. The authenticated identity then lands in the JSONL ack store, giving you an audit trail of who acked what. perf-sentinel-adminsis illustrative. Use whatever group your IdP exposes in thegroupsclaim.
Endpoints
GET /api/status
Returns a compact liveness object. Use this as a readiness probe or as the cheapest way to verify the daemon is up.
Query parameters: none.
Response shape:
| Field | Type | Description |
|---|---|---|
version | string | Daemon binary version (Cargo package version) |
uptime_seconds | number | Seconds since the daemon process started |
active_traces | number | Traces currently held in the correlation window |
max_active_traces | number | Configured cap of the correlation window (since 0.8.8) |
analysis_queue_depth | number | Batches waiting in the analysis worker queue (since 0.8.8) |
analysis_queue_capacity | number | Configured cap of that queue (since 0.8.8) |
stored_findings | number | Findings currently retained in the query ring buffer |
max_retained_findings | number | Configured cap of that ring buffer (since 0.8.8) |
The three gauge/capacity pairs back the Headroom chart of perf-sentinel query monitor's Trends tab: each pair reads as "how close is this runtime gauge to its configured cap". The settings advisor starts hinting at 90% of max_active_traces. The fields are additive; clients written against older daemons keep parsing.
Example:
curl -sS http://127.0.0.1:4318/api/status{
"version": "0.8.8",
"uptime_seconds": 48,
"active_traces": 12,
"max_active_traces": 10000,
"analysis_queue_depth": 0,
"analysis_queue_capacity": 1024,
"stored_findings": 5,
"max_retained_findings": 10000
}GET /api/config
The daemon's effective [daemon] configuration, read-only (since 0.8.8). Backs the Config tab of perf-sentinel query monitor. Built as an explicit allowlist, never a blanket serialization of the internal config, so no secret is exposed: TLS cert/key paths and the ack API key are summarized to booleans (tls_configured, ack_api_key_set) and never echoed. The values are frozen at daemon startup.
Query parameters: none.
Response shape: an object with the [daemon] scalars (listen_addr, listen_port, listen_port_grpc, json_socket, max_active_traces, trace_ttl_ms, sampling_rate, max_events_per_trace, max_payload_size, environment, max_retained_findings, ingest_queue_capacity, analysis_queue_capacity, api_enabled), the summarized sub-systems (tls_configured, ack_enabled, ack_api_key_set, cors_allowed_origins, archive_configured), and the correlation block (correlation_enabled, correlation_window_ms, correlation_lag_threshold_ms, correlation_min_co_occurrences, correlation_min_confidence, correlation_max_tracked_pairs).
Example:
curl -sS http://127.0.0.1:4318/api/config{
"listen_addr": "127.0.0.1",
"listen_port": 4318,
"max_active_traces": 10000,
"trace_ttl_ms": 30000,
"sampling_rate": 1.0,
"environment": "staging",
"api_enabled": true,
"tls_configured": false,
"ack_enabled": true,
"ack_api_key_set": false,
"cors_allowed_origins": [],
"archive_configured": false,
"correlation_enabled": false,
"correlation_max_tracked_pairs": 10000
}(Fields elided above for brevity; the live response carries the full set listed under Response shape.)
GET /api/energy
Live health of the five energy/intensity backends (since 0.8.8): the four scraped measured-energy sources (Scaphandre, Kepler, Redfish, cloud SPECpower) and the Electricity Maps real-time intensity API. Backs the Scrapers tab of perf-sentinel query monitor. The effective mix itself (which source won the precedence chain per service, grid intensity per region) lives on /api/export/report under green_summary; this endpoint only answers "is each backend configured, fresh, and succeeding".
Query parameters: none.
Response shape: an object with a backends array of five entries in a fixed order (scaphandre, kepler, redfish, cloud_energy, electricity_maps), each:
| Field | Type | Description |
|---|---|---|
backend | string | Stable backend name |
configured | boolean | Whether the backend is configured, from the [green] config frozen at daemon startup |
last_scrape_age_seconds | number | Seconds since the last successful scrape, as of the backend's most recent scrape tick (same semantics as the /metrics gauge). Omitted when not configured or when the backend has no freshness gauge |
scrapes_ok | number | Successful scrapes since daemon start. Omitted when not configured or not scraped (cloud_energy, electricity_maps) |
scrapes_failed | number | Failed scrapes since daemon start. Same omission rules as scrapes_ok |
The optional fields are omitted rather than zeroed for unconfigured backends: the underlying Prometheus gauges are pre-registered at 0, and a literal 0 would read as a fresh scrape. electricity_maps carries no freshness gauge by design; its liveness shows as intensity_source = "real_time" entries on the report's region breakdown.
Two age-reading caveats. A configured backend still reads last_scrape_age_seconds = 0.0 during its first scrape interval after daemon start, before anything has actually been scraped: read it together with scrapes_ok = 0 to tell "not scraped yet" from "fresh". And for cloud_energy the age tracks the reachability of the configured Prometheus endpoint, not per-service coverage: a tick counts as successful as soon as one service yields a reading.
Example:
curl -sS http://127.0.0.1:4318/api/energy{
"backends": [
{
"backend": "scaphandre",
"configured": true,
"last_scrape_age_seconds": 3.0,
"scrapes_ok": 120,
"scrapes_failed": 2
},
{ "backend": "kepler", "configured": false },
{ "backend": "redfish", "configured": false },
{ "backend": "cloud_energy", "configured": false },
{ "backend": "electricity_maps", "configured": true }
]
}GET /api/findings
Returns a JSON array of recent findings, newest first. Each element wraps the finding itself plus a daemon-side ingestion timestamp.
Query parameters:
| Name | Type | Default | Description |
|---|---|---|---|
service | string | none | Exact match on the finding.service field |
type | string | none | Exact match on finding.type in snake_case (e.g. n_plus_one_sql, redundant_sql) |
severity | string | none | Exact match on finding.severity in snake_case (critical, warning, info) |
limit | integer | 100 | Maximum number of entries to return, capped server-side at 1000 (higher values are silently clamped) |
Unknown parameters are ignored. Malformed values (e.g. limit=abc) return HTTP 400 with an axum-generated error body.
Response shape: array of StoredFinding. Each StoredFinding has:
finding: the detected finding. SeeFindingschema below.stored_at_ms: integer Unix timestamp in milliseconds, recorded when the daemon inserted this finding into the ring buffer.
Example:
curl -sS "http://127.0.0.1:4318/api/findings?severity=warning&limit=2"[
{
"finding": {
"type": "n_plus_one_sql",
"severity": "warning",
"trace_id": "trace-n1-sql",
"service": "order-svc",
"source_endpoint": "POST /api/orders/42/submit",
"pattern": {
"template": "SELECT * FROM order_item WHERE order_id = ?",
"occurrences": 6,
"window_ms": 250,
"distinct_params": 6
},
"suggestion": "Use WHERE ... IN (?) to batch 6 queries into one",
"first_timestamp": "2025-07-10T14:32:01.000Z",
"last_timestamp": "2025-07-10T14:32:01.250Z",
"green_impact": {
"estimated_extra_io_ops": 5,
"io_intensity_score": 6.0,
"io_intensity_band": "high"
},
"confidence": "daemon_staging"
},
"stored_at_ms": 1776350162450
},
{
"finding": {
"type": "n_plus_one_http",
"severity": "warning",
"trace_id": "trace-n1-http",
"service": "order-svc",
"source_endpoint": "POST /api/orders/42/submit",
"pattern": {
"template": "GET /api/users/{id}",
"occurrences": 6,
"window_ms": 200,
"distinct_params": 6
},
"suggestion": "Use batch endpoint with ?ids=... to batch 6 calls into one",
"first_timestamp": "2025-07-10T14:32:01.000Z",
"last_timestamp": "2025-07-10T14:32:01.200Z",
"green_impact": {
"estimated_extra_io_ops": 5,
"io_intensity_score": 6.0,
"io_intensity_band": "high"
},
"confidence": "daemon_staging"
},
"stored_at_ms": 1776350162450
}
]Finding schema
The finding object exposed by /api/findings and /api/findings/{trace_id} is identical to the JSON emitted by perf-sentinel analyze --format json. Stable fields as of v0.4.1:
| Field | Type | Description |
|---|---|---|
type | string (enum) | n_plus_one_sql, n_plus_one_http, redundant_sql, redundant_http, slow_sql, slow_http, excessive_fanout, chatty_service, pool_saturation, serialized_calls |
severity | string (enum) | critical, warning, info |
trace_id | string | Trace ID where the pattern was detected |
service | string | Service that emitted the anti-pattern |
source_endpoint | string | Normalized inbound endpoint hosting the pattern |
pattern | object | { template, occurrences, window_ms, distinct_params } |
suggestion | string | Human-readable remediation hint |
first_timestamp | string (ISO 8601) | Earliest span in the detected group |
last_timestamp | string (ISO 8601) | Latest span in the detected group |
confidence | string (enum) | ci_batch, daemon_staging, daemon_production |
green_impact | object (optional) | { estimated_extra_io_ops, io_intensity_score, io_intensity_band } when green scoring is enabled |
code_location | object (optional) | { function?, filepath?, lineno?, namespace? } when OTel code.* attributes are present |
suggested_fix | object (optional) | { pattern, framework, recommendation, reference_url? } when the framework can be inferred (Java/JPA in v1) |
GET /api/findings/{trace_id}
Returns all findings whose trace_id matches the path segment, as a JSON array. Same element shape as /api/findings. Hard cap of 1000 entries applies (pathological traces with hundreds of N+1 clusters).
Path parameter: trace_id (string, exact match). The path segment is URL-decoded by axum before comparison.
Response shape: same Vec<StoredFinding> as /api/findings. An empty array [] is returned when the trace ID is unknown (the endpoint does not return 404).
Example:
curl -sS "http://127.0.0.1:4318/api/findings/trace-n1-sql"[
{
"finding": {
"type": "n_plus_one_sql",
"severity": "warning",
"trace_id": "trace-n1-sql",
"service": "order-svc",
"source_endpoint": "POST /api/orders/42/submit",
"pattern": {
"template": "SELECT * FROM order_item WHERE order_id = ?",
"occurrences": 6,
"window_ms": 250,
"distinct_params": 6
},
"suggestion": "Use WHERE ... IN (?) to batch 6 queries into one",
"first_timestamp": "2025-07-10T14:32:01.000Z",
"last_timestamp": "2025-07-10T14:32:01.250Z",
"green_impact": {
"estimated_extra_io_ops": 5,
"io_intensity_score": 6.0,
"io_intensity_band": "high"
},
"confidence": "daemon_staging"
},
"stored_at_ms": 1776350162450
}
]GET /api/explain/{trace_id}
Returns the span tree for a trace still held in the daemon correlation window (default TTL: 30 seconds after the last span of the trace arrived). Useful for debugging a live trace right after it is emitted.
Important: findings are retained in the ring buffer long after the trace itself evicts from the window. That means /api/findings/{trace_id} keeps working for hours after the trace is gone, but /api/explain/{trace_id} only works within the TTL window.
Path parameter: trace_id (string, exact match).
Response shape (trace in memory): object with a roots array. Each node describes a span with:
| Field | Type | Description | |
|---|---|---|---|
span_id | string | Span identifier | |
parent_span_id | string \ | null | Parent span identifier, null for root spans |
service | string | Service that emitted the span | |
operation | string | Operation name (e.g. SELECT, GET, POST) | |
template | string | Normalized SQL query or HTTP route | |
timestamp | string | ISO 8601 start timestamp | |
duration_us | number | Duration in microseconds | |
findings | array | Findings attached to this span, each { type, severity, suggestion, occurrences } | |
children | array | Child span nodes, recursive |
Response shape (trace unknown or evicted): an object with a single error field.
Examples:
# Trace still in memory
curl -sS "http://127.0.0.1:4318/api/explain/trace-n1-sql"{
"roots": [
{
"children": [],
"duration_us": 800,
"findings": [
{
"occurrences": 6,
"severity": "warning",
"suggestion": "Use WHERE ... IN (?) to batch 6 queries into one",
"type": "n_plus_one_sql"
}
],
"operation": "SELECT",
"parent_span_id": null,
"service": "order-svc",
"span_id": "span-1",
"template": "SELECT * FROM order_item WHERE order_id = ?",
"timestamp": "2025-07-10T14:32:01.000Z"
}
]
}# Trace not in memory (evicted or never seen)
curl -sS "http://127.0.0.1:4318/api/explain/trace-does-not-exist"{
"error": "trace not found in daemon memory"
}GET /api/correlations
Returns active cross-trace temporal correlations, sorted by confidence descending. Empty array when [daemon.correlation] enabled = false (default). Capped at 1000 entries.
Query parameters: none.
Response shape: array of CrossTraceCorrelation. Each entry has:
| Field | Type | Description |
|---|---|---|
source | object | Leading endpoint: { finding_type, service, template } |
target | object | Trailing endpoint observed after source within lag_threshold_ms |
co_occurrence_count | number | Number of co-occurrences within the rolling window |
source_total_occurrences | number | Total occurrences of source in the rolling window |
confidence | number | Ratio co_occurrence_count / source_total_occurrences |
median_lag_ms | number | Median lag between source and target |
first_seen | string | ISO 8601 timestamp of the first co-occurrence |
last_seen | string | ISO 8601 timestamp of the most recent co-occurrence |
Example:
curl -sS "http://127.0.0.1:4318/api/correlations"[
{
"source": {
"finding_type": "redundant_sql",
"service": "cache-svc",
"template": "SELECT * FROM settings WHERE key = ?"
},
"target": {
"finding_type": "n_plus_one_sql",
"service": "order-svc",
"template": "SELECT * FROM order_item WHERE order_id = ?"
},
"co_occurrence_count": 2,
"source_total_occurrences": 1,
"confidence": 2.0,
"median_lag_ms": 0.0,
"first_seen": "2026-04-16T14:36:02.450Z",
"last_seen": "2026-04-16T14:36:02.450Z"
}
]GET /api/export/report
Snapshot the daemon's current in-memory state as a Report JSON, identical in shape to perf-sentinel analyze --format json. This closes the loop between the live daemon and the post-mortem perf-sentinel report HTML dashboard: the HTML report can ingest a daemon snapshot over HTTP via standard shell composition.
The analysis section reflects daemon-lifetime counters (cumulative since daemon start). The green_summary field is refreshed by the event loop after each batch (regions, top offenders, avoidable I/O ratio, CO2 numbers, scoring config), so the snapshot carries a live CO2 picture. The chip banner and the GreenOps tab in the HTML dashboard surface naturally on Electricity-Maps-configured daemons. The quality gate is not recomputed on the snapshot path. See 05 · GreenOps & carbon for the full audit-trail story.
Cold-start behavior. When the daemon has not yet processed any event, the endpoint returns 200 OK with an empty Report envelope: findings: [], green_summary: GreenSummary::disabled(0), and warnings: ["daemon has not yet processed any events"]. Pre-0.5.16 this path returned 503 Service Unavailable, which tripped Kubernetes probes and confused CI scripts that treated 5xx as a daemon health issue. The empty envelope lets clients distinguish "cold start" from "events seen, zero findings" (the latter returns 200 with no warning string and analysis.events_processed > 0) without a status code mismatch. The double-counter guard (events_processed_total > 0 AND traces_analyzed_total > 0) is preserved internally so the snapshot stays self-consistent during the trace_ttl_ms / 2 window between the first event ingest and the first eviction tick.
Prometheus metric. Each request bumps perf_sentinel_export_report_requests_total so operators can dashboard or alert on Report snapshot frequency.
Example:
# Materialize a live daemon snapshot as an HTML dashboard
curl -s http://daemon.internal:4318/api/export/report \
| perf-sentinel report --input - --output report.htmlThe report subcommand auto-detects the JSON shape: a top-level array is treated as trace events (pipelined through normalize + detect + score), a top-level object is treated as a pre-computed Report (taken as-is). The Correlations tab in the HTML dashboard lights up automatically when the daemon-produced Report carries non-empty correlations.
POST /api/findings/{signature}/ack
Acknowledge a finding at runtime. The signature is the canonical <finding_type>:<service>:<sanitized_endpoint>:<sha256-prefix> produced by the same hashing logic as the CI TOML workflow (see Acknowledgments). Available since 0.5.20.
The daemon maintains a JSONL append-only store at ~/.local/share/perf-sentinel/acks.jsonl by default (configurable via [daemon.ack] storage_path). The store is replayed and compacted at every daemon restart, so an ack/unack churn loop cannot accumulate forever.
Headers:
Content-Type: application/json(required, even with an empty body).X-User-Id: <identifier>(optional, populates the auditbyfield with priority over the JSON body, falling back to"anonymous").X-API-Key: <secret>(required only when[daemon.ack] api_keyis set in the daemon config, constant-time compared).
Body (all fields optional):
{
"by": "alice@example.com",
"reason": "deferred to next quarter, see TICKET-1234",
"expires_at": "2026-08-01T00:00:00Z"
}Responses:
| Status | Condition |
|---|---|
| 201 | Ack created |
| 400 | Signature does not match the canonical format |
| 401 | [daemon.ack] api_key is set, header is missing or wrong |
| 409 | The signature is already acked (use DELETE first to revoke) |
| 503 | [daemon.ack] enabled = false, the runtime ack store is offline |
Example:
SIG="n_plus_one_sql:order-svc:_api_v1_orders:aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"
curl -fsS -X POST "http://127.0.0.1:4318/api/findings/${SIG}/ack" \
-H "Content-Type: application/json" \
-H "X-User-Id: alice@example.com" \
-d '{"reason":"deferred to next quarter","expires_at":"2026-08-01T00:00:00Z"}'
# 201 CreatedAfter a successful ack, GET /api/findings filters the entry out by default. Pass ?include_acked=true to see it back with an acknowledged_by annotation.
DELETE /api/findings/{signature}/ack
Revoke a previously created daemon ack. Same auth headers as POST. The matching finding reappears on GET /api/findings immediately.
Responses:
| Status | Condition |
|---|---|
| 204 | Ack revoked |
| 400 | Signature does not match the canonical format |
| 401 | API key required and missing or wrong |
| 404 | The signature is not currently acked at the daemon |
| 503 | Runtime ack store offline |
Note: this endpoint only revokes daemon-side acks. CI TOML acks are read-only at runtime and require a PR against the .perf-sentinel-acknowledgments.toml file to remove.
GET /api/acks
Returns the array of active runtime acks (post-replay, post-expiry filter). Read-only, no auth required (reads on a loopback API are considered safe even when the daemon enforces an API key on writes).
Response: array of objects, one per active ack:
[
{
"action": "ack",
"signature": "n_plus_one_sql:order-svc:_api_v1_orders:aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa",
"by": "alice@example.com",
"reason": "deferred to next quarter",
"at": "2026-05-04T13:30:00Z",
"expires_at": "2026-08-01T00:00:00Z"
}
]This endpoint surfaces only the daemon-side JSONL acks. CI TOML acks loaded at startup are not included, query the TOML file directly for that view, or call GET /api/findings?include_acked=true and inspect the acknowledged_by.source field to see both sources unified.
TOML and JSONL interop
The daemon reads .perf-sentinel-acknowledgments.toml (path configurable via [daemon.ack] toml_path) at startup and unions its entries with the JSONL store at query time. TOML wins on conflict: when a signature is acked in both, the response carries the TOML metadata (source: "toml"). This keeps the CI baseline immutable from the daemon side, an SRE cannot accidentally override what the team agreed to in PR review.
| Source | Persistence | Audit | Mutable at runtime |
|---|---|---|---|
| TOML | Repo file | git log | No (PR-only) |
| Daemon | acks.jsonl on disk | JSONL append + compaction | Yes (POST/DELETE) |
Behavior change in 0.5.20: /api/findings default filter
GET /api/findings (and the ?service= / ?type= / ?severity= filters) now omits acked findings by default. Pass ?include_acked=true to restore the pre-0.5.20 behavior. The opt-in default mirrors the CLI 0.5.17 --acknowledgments semantics: an operator looking at "what is currently broken" should not be drowned in entries the team has already triaged.
The /api/findings/{trace_id} and /api/export/report endpoints intentionally keep their previous shape, the per-trace and full-report views are diagnostic and may need to surface acked findings even in the default path.
Error responses
| Condition | Status | Body |
|---|---|---|
Unknown trace_id on /api/findings/{trace_id} | 200 | [] |
Unknown trace_id on /api/explain/{trace_id} | 200 | {"error": "trace not found in daemon memory"} |
| Correlations disabled or correlator idle | 200 | [] |
/api/export/report on cold-start daemon | 200 | empty Report envelope with warnings: ["daemon has not yet processed any events"] (pre-0.5.16: 503) |
Malformed query parameter (e.g. limit=abc) | 400 | axum-generated plain-text error |
Unknown path (e.g. /api/does-not-exist) | 404 | empty body |
| Method other than GET | 405 | axum-generated plain-text error |
The API does not emit 5xx on normal operation. A process crash returns whatever the TCP stack emits (connection reset).
Use cases
Prometheus alerting on critical findings
Run a Prometheus Blackbox exporter that scrapes /api/findings?severity=critical&limit=1 and alerts when the response array is non-empty. Example AlertManager rule using a vector_count computed by a recording rule:
groups:
- name: perf-sentinel
rules:
- alert: PerfSentinelCriticalFinding
expr: perf_sentinel_findings_total{severity="critical"} > 0
for: 2m
labels:
severity: page
annotations:
summary: "perf-sentinel detected a critical performance anti-pattern"
description: |
Critical finding count is {{ $value }}.
Query `/api/findings?severity=critical` on the daemon for details.The built-in Prometheus scrape endpoint at /metrics already exposes perf_sentinel_findings_total{type,severity} as a counter, so you do not need the query API for counting alerts. Use the query API to fetch the payload (template, trace ID, suggestion) that the alert handler includes in the notification.
Custom Grafana dashboard via the JSON datasource
Install the Grafana JSON API datasource plugin, point it at the daemon, and build per-service tables. Example panel query returning the 20 most recent findings for order-svc:
URL: http://perf-sentinel.internal:4318/api/findings
Method: GET
Params: service=order-svc
limit=20
Fields: $.finding.type,
$.finding.severity,
$.finding.pattern.template,
$.finding.pattern.occurrences,
$.finding.source_endpoint,
$.stored_at_msPair this with the Prometheus /metrics endpoint already exposed by the daemon for time-series trends and use the query API for the list of concrete findings the user can click into.
SRE runbook: page on a stuck scraper
If your daemon has any opt-in scraper configured ([green.scaphandre], [green.cloud], [green.electricity_maps], [pg_stat]), a staleness in active_traces or stored_findings growth is a strong signal that ingestion has stalled. A bash snippet to embed in an on-call runbook:
#!/usr/bin/env bash
set -euo pipefail
DAEMON="${DAEMON:-http://127.0.0.1:4318}"
response=$(curl -sSf --max-time 3 "${DAEMON}/api/status")
uptime=$(echo "$response" | jq -r '.uptime_seconds')
traces=$(echo "$response" | jq -r '.active_traces')
findings=$(echo "$response" | jq -r '.stored_findings')
if [ "$uptime" -gt 300 ] && [ "$traces" -eq 0 ] && [ "$findings" -eq 0 ]; then
echo "perf-sentinel daemon has been idle for ${uptime}s with no traces or findings"
echo "Check ingestion path: OTLP endpoint, collector config, Java agent env vars"
exit 1
fiWire this to PagerDuty or OpsGenie via the on-call escalation tool of your choice.
Stability contract
The query API carries a stability promise starting at v0.4.1.
What is stable:
- All paths listed in Endpoint overview.
- All fields listed in the endpoint sections above. Field names and shapes will not be renamed, removed or retyped in a minor release.
- Enum values (
finding.type,finding.severity,finding.confidence,io_intensity_bandand so on): existing variants remain. New variants may be added in minor releases. Clients must tolerate unknown enum values and not crash on them. - The behavior of the five error responses in Error responses.
What may change in a minor release:
- New optional fields may be added to any JSON object.
- New enum variants may be added.
- New endpoints under
/api/...may be introduced. - Default values (e.g.
limit=100) may be tuned if profiling shows a better default, but the hard cap (1000) will not shrink.
What requires a major release:
- Removing or renaming any field.
- Retyping a field (e.g. turning a number into a string).
- Shrinking the hard cap on
/api/findings?limit=. - Changing the authentication surface (the current contract is unauthenticated loopback-only by default).
Client guidance:
- Always tolerate unknown fields in JSON objects.
- Never parse enum variants exhaustively without a fallback branch.
- Pin the daemon version in your CI/CD manifests and review the
CHANGELOG.mdbefore bumping.
See also
- Integration for the overall deployment topology.
- Configuration for
[daemon]and[daemon.correlation]settings. - 06 · Ingestion & daemon for the daemon's internal design.