Configuration reference
perf-sentinel is configured via a .perf-sentinel.toml file. All fields are optional and have sensible defaults.
<img alt="CLI commands overview" src="https://raw.githubusercontent.com/robintra/perf-sentinel/main/docs/diagrams/svg/cli-commands.svg">
Contents
- Subcommands: which subcommands read
.perf-sentinel.toml. - Sections: full per-section reference (
[thresholds],[detection],[green],[daemon], ...). - Minimal configuration: the smallest useful
.perf-sentinel.toml. - Full configuration example: every section populated with example values.
- Migration from 0.5.x: the 8 legacy top-level keys removed in 0.6.0 and how to migrate.
- Environment variables: which env vars override config-file values.
Subcommands
| Subcommand | Description |
|---|---|
analyze | Batch analysis of trace files. Reads from file or stdin |
explain | Tree view of a specific trace with findings annotated inline |
watch | Daemon mode: real-time OTLP ingestion and streaming detection |
query | Query a running daemon for findings, correlations or status. Colored text output by default, --format json for scripting. query inspect opens a live TUI |
demo | Run analysis on an embedded demo dataset |
bench | Benchmark throughput on a trace file |
pg-stat | Analyze pg_stat_statements exports (CSV/JSON or Prometheus) |
inspect | Interactive TUI to browse traces, findings and span trees |
diff | Compare two trace sets and emit a delta report (new/resolved findings, severity changes, per-endpoint I/O op deltas). Text/JSON/SARIF output |
report | Single-file HTML dashboard for post-mortem exploration in any browser. Accepts a trace file, a pre-computed Report JSON, or stdin via --input - (auto-detects array-of-events vs Report object, BOM-tolerant) |
tempo | Fetch traces from a Grafana Tempo HTTP API (single trace by ID or search-then-fetch by service) and pipe them through the analysis pipeline. Gated behind the tempo feature |
calibrate | Correlate a trace file with measured energy readings (Scaphandre, cloud monitoring CSV) and emit a TOML of I/O-to-energy coefficients to load via [green] calibration_file |
Sections
[thresholds]
Quality gate thresholds. The quality gate fails if any rule is violated.
| Field | Type | Default | Description |
|---|---|---|---|
n_plus_one_sql_critical_max | integer | 0 | Maximum number of critical N+1 SQL findings before the gate fails |
n_plus_one_http_warning_max | integer | 3 | Maximum number of warning or higher N+1 HTTP findings before the gate fails |
io_waste_ratio_max | float | 0.30 | Maximum I/O waste ratio (0.0 to 1.0) before the gate fails |
[detection]
Detection algorithm parameters.
| Field | Type | Default | Description |
|---|---|---|---|
n_plus_one_min_occurrences | integer | 5 | Minimum number of occurrences (with distinct params) to flag an N+1 pattern |
window_duration_ms | integer | 500 | Time window in milliseconds within which repeated operations are considered an N+1 pattern |
slow_query_threshold_ms | integer | 500 | Duration threshold in milliseconds above which an operation is considered slow |
slow_query_min_occurrences | integer | 3 | Minimum number of slow occurrences of the same template to generate a finding |
max_fanout | integer | 20 | Maximum child spans per parent before flagging as excessive fanout (range: 1-100000) |
chatty_service_min_calls | integer | 15 | Minimum HTTP outbound calls per trace to flag as chatty service. Severity: warning > threshold, critical > 3x threshold. |
pool_saturation_concurrent_threshold | integer | 10 | Peak concurrent SQL spans per service to flag connection pool saturation risk. Uses a sweep-line algorithm on span timestamps. |
serialized_min_sequential | integer | 3 | Minimum sequential independent sibling calls (same parent, no time overlap, different templates) to flag as potentially parallelizable. |
sanitizer_aware_classification | string | "auto" | How to classify SQL groups whose literals were collapsed to a placeholder (?, $?, %s, @param, :name) by an OTel agent or database driver. One of "auto", "strict", "always", "never". See note below. |
sanitizer_aware_classification
OpenTelemetry agents and database drivers ship with SQL statement sanitization ON by default to keep PII out of trace attributes. The placeholder style depends on the stack: JDBC agents produce bare ?, PostgreSQL native drivers (pgx, asyncpg, sqlx) produce $1/$2 (normalized to $?), Python DB-API drivers produce %s, .NET drivers produce @p0/@Name, and Oracle/SQLAlchemy produce :name. In all cases the spans reach perf-sentinel with the same template and no extractable parameters, so the standard distinct-params rule rejects the group and the redundant detector picks it up as redundant_sql instead of n_plus_one_sql. This setting controls the heuristic that recovers the correct classification:
"auto"(default): emitn_plus_one_sqlwhen either the ORM scope signal (Spring Data, Hibernate, EF Core, SQLAlchemy, ActiveRecord, GORM, Prisma, Diesel, ...) or the per-span timing variance is high enough to indicate distinct row lookups. Otherwise leave the group to the redundant detector. Best recall on production Spring Data, EF Core and similar ORM stacks."strict": reclassify only when a primary signal (ORM scope marker, high occurrence >= 3 xn_plus_one_min_occurrences, or sequential siblings) fires conjointly with a corroborating signal (high timing variance or high occurrence). Preservesredundant_sqlprecision on moderate-count cached identical queries (legacy polling loops, unmemoized config lookups, typically 5-10 calls per request). Above the high-occurrence bar (default 15), any sanitized group fires regardless of ORM scope, sequential siblings, or variance, under thelooks_sanitizedguard. Use this when actionableredundant_sqlfindings are valuable signal that should not be silently absorbed inton_plus_one_sql."always": reclassify any sanitized group with at leastn_plus_one_min_occurrencesspans asn_plus_one_sql. Aggressive, may flip a real single-param redundancy."never": disable the heuristic entirely and fall back to the strictdistinct_paramscheck.
Findings reclassified by the heuristic (whether under "auto", "strict", or "always") carry classification_method = "sanitizer_heuristic" in their JSON representation so operators can spot where it is firing. Findings produced by the standard rule omit the field.
[green]
See also. The Energy and SCI primer in the methodology doc defines SCI v1.0 (E + I + M terms), RAPL, Scaphandre, SPECpower, Boavizta and the Electricity Maps API used by the config sections below. Read it once if any term feels unfamiliar.
GreenOps scoring configuration aligned with SCI v1.0 (operational + embodied terms, confidence intervals, multi-region).
| Field | Type | Default | Description |
|---|---|---|---|
enabled | boolean | true | Enable GreenOps scoring (IIS, waste ratio, top offenders, CO₂) |
default_region | string | (none) | Fallback cloud region used when neither the span's cloud.region attribute nor the service_regions mapping resolves a region. Examples: "eu-west-3", "us-east-1", "FR" |
embodied_carbon_per_request_gco2 | float | 0.001 | SCI v1.0 M term: hardware manufacturing emissions amortized per request (per trace), in gCO₂eq. Region-independent. Set to 0.0 to disable embodied carbon |
use_hourly_profiles | boolean | true | When true, the scoring stage uses time-of-day-specific grid intensities for the 30+ regions with embedded hourly profiles. Regions with monthly x hourly profiles (FR, DE, GB, US-East) also account for seasonal variation. Reports are tagged model = "io_proxy_v3" (monthly x hourly) or "io_proxy_v2" (flat-year hourly). Set to false to pin reports to the flat-annual model |
hourly_profiles_file | string | (none) | Path to a JSON file with user-supplied hourly profiles. Can be absolute or relative to the config file. Profiles in this file take precedence over embedded profiles for the same region key. See "User-supplied profiles" below |
per_operation_coefficients | boolean | true | When true, the proxy model weights energy per I/O op by operation type: SQL SELECT (0.5x), INSERT/UPDATE (1.5x), DELETE (1.2x) and HTTP payload size tiers (small <10 KB: 0.8x, medium 10 KB-1 MB: 1.2x, large >1 MB: 2.0x). Does not apply when Scaphandre or cloud SPECpower measured energy is available. Set to false to use the flat ENERGY_PER_IO_OP_KWH for all operations |
include_network_transport | boolean | false | When true, adds a network transport energy term for cross-region HTTP calls. Requires response_size_bytes on HTTP spans (OTel http.response.body.size attribute) and callee region mapped via [green.service_regions]. Same-region calls are excluded. Transport CO₂ appears as transport_gco2 in the JSON report |
network_energy_per_byte_kwh | float | 4e-11 | Energy per byte for network transport (kWh/byte). Default 0.04 kWh/GB, a conservative upper bound for cross-region server traffic (see Limitations, network transport). Only used when include_network_transport = true |
[green.service_regions]
Per-service region overrides used when OTel cloud.region is absent from spans (e.g. Jaeger / Zipkin ingestion). Maps service name → region key.
[green]
default_region = "eu-west-3"
embodied_carbon_per_request_gco2 = 0.001
[green.service_regions]
"order-svc" = "us-east-1"
"chat-svc" = "ap-southeast-1"Region resolution chain
For each span, the carbon scoring stage resolves the effective region in this order (first match wins):
event.cloud_region: from the OTelcloud.regionresource attribute (or span attribute as fallback). Most authoritative.[green.service_regions][event.service]: per-service config override.[green] default_region: global fallback.
I/O ops with no resolvable region land in a synthetic "unknown" bucket (zero operational CO₂; the row appears in regions[] for visibility). Embodied carbon is still emitted because hardware manufacturing emissions are region-independent. The region cardinality is capped at 256 distinct buckets; excess values fold into the unknown bucket to prevent memory exhaustion from misconfigured ingestion.
Output shape
When green scoring is enabled and at least one event is analyzed, the JSON report's green_summary includes:
co2: structured{ total, avoidable, operational_gco2, embodied_gco2 }object. Bothtotalandavoidableare{ low, mid, high, model, methodology }with 2× multiplicative uncertainty (low = mid/2,high = mid×2). Themethodologytag distinguishestotal("sci_v1_numerator":(E × I) + Msummed over traces or"sci_v1_numerator+transport"when network transport energy is included) fromavoidable("sci_v1_operational_ratio": region-blind global ratio, excludes embodied).modelvalues, most precise wins:"electricity_maps_api">"scaphandre_rapl">"kepler_ebpf">"redfish_bmc">"cloud_specpower">"io_proxy_v3">"io_proxy_v2">"io_proxy_v1". When calibration factors are active on proxy models,+calis appended (e.g."io_proxy_v2+cal"). The+calsuffix never applies to a measured tag.regions[]: per-region breakdown with{ region, grid_intensity_gco2_kwh, pue, io_ops, co2_gco2, intensity_source }, sorted byco2_gco2descending (highest-impact regions first) with alphabetical tiebreak.intensity_sourceis"annual","hourly","monthly_hourly"or"real_time"(Electricity Maps API) depending on which carbon intensity source was used for the region.
Carbon intensity data is embedded in the binary (no network egress). See 05 · GreenOps & carbon for the complete formula and methodology and Limitations for the directional / non-regulatory disclaimer.
User-supplied hourly profiles
Set [green] hourly_profiles_file to a JSON file to provide your own hourly profiles. This is useful for datacenter operators with their own power purchase agreements (PPAs) or for overriding the embedded data with local measurements.
{
"profiles": {
"my-datacenter": {
"type": "flat_year",
"hours": [45.0, 44.0, 43.0, "... 24 values total ..."]
},
"eu-west-3": {
"type": "monthly",
"months": [
[50.0, 49.0, "... 24 values for January ..."],
["... 11 more months ..."]
]
}
}
}User-supplied profiles take precedence over embedded profiles for the same region key. Validation at config load: each flat_year must have exactly 24 values, each monthly must have exactly 12 arrays of 24 values. All values must be finite and non-negative. If the region key exists in the embedded carbon table, a warning is logged when the profile mean deviates more than 5% from the annual value, but the profile is still accepted.
Hourly profile region aliases
Country-code aliases and cloud-provider synonyms are resolved to the same hourly profile. For example, "fr", "francecentral" and "europe-west9" all map to the eu-west-3 (France) profile. Notable mappings:
"us","eastus"->us-east-1(US-East, the most common US deployment region)"westeurope","nl"->eu-west-4(Netherlands)"northeurope","ie"->eu-west-1(Ireland)"uksouth","gb","uk"->eu-west-2(UK)"westus2"->us-west-2(Oregon)
The full alias table is in score/carbon_profiles.rs. If your region key is not aliased, the flat annual value from the primary carbon table is used.
[green.scaphandre] (optional, opt-in)
Opt-in integration with Scaphandre for per-process energy measurement on Linux hosts with Intel RAPL support. When configured, the watch daemon spawns a background task that scrapes the Scaphandre Prometheus endpoint every scrape_interval_secs and uses the measured power readings to replace the fixed ENERGY_PER_IO_OP_KWH constant for each mapped service.
| Field | Type | Default | Description |
|---|---|---|---|
endpoint | string | (none) | Full URL of the Scaphandre Prometheus /metrics endpoint. Must start with http:// or https:// (TLS supported via hyper-rustls). Required when the section is present |
scrape_interval_secs | integer | 5 | How often to scrape, in seconds. Valid range: 1-3600 |
process_map | table | {} | Maps perf-sentinel service names (from span service.name) to a per-service ProcessMatcher (see below) |
Each process_map entry is a table with two fields: exe_contains (required, substring matched against the Scaphandre exe label) and cmdline_contains (optional, substring matched against the cmdline label). The matcher requires both substrings to be present when cmdline_contains is set. Exactly one Scaphandre process must match per entry, otherwise the scoring stage skips that service for the tick and emits a warn log naming the ambiguity.
[green.scaphandre]
endpoint = "http://localhost:8080/metrics"
scrape_interval_secs = 5
[green.scaphandre.process_map."order-svc"]
exe_contains = "bin/java"
cmdline_contains = "order-svc.jar"
[green.scaphandre.process_map."chat-svc"]
exe_contains = "bin/java"
cmdline_contains = "chat-svc.jar"
[green.scaphandre.process_map."native-svc"]
exe_contains = "/opt/native-svc/bin/native-svc"Why both exe_contains and cmdline_contains. Scaphandre emits exe as an absolute path of the runtime (/usr/lib/jvm/.../bin/java, /usr/share/dotnet/dotnet). Several co-located services sharing a runtime (multiple JVMs, multiple .NET assemblies) collide on exe, and only cmdline discriminates them. Real Scaphandre also concatenates argv without separators: java -jar /tmp/order-svc.jar is emitted as cmdline="java-jar/tmp/order-svc.jar". Configure cmdline_contains with a substring that appears in this concatenated form (e.g. the jar/dll filename), NOT with a POSIX command line containing spaces.
Ignored in analyze batch mode. Only the watch daemon spawns the scraper. The analyze command always uses the proxy model regardless of this section.
Fallback behaviour. When the endpoint is unreachable, a service is not present in process_map or a service had zero ops in the current scrape window, the scoring stage falls back to the proxy model for those spans. The first failure logs at warn level; subsequent failures log at debug to avoid spam. The perf_sentinel_scaphandre_last_scrape_age_seconds Prometheus gauge lets operators detect a hung scraper.
Precision bounds (important). Scaphandre improves the per-service energy coefficient but does NOT give per-finding attribution. RAPL is process-level, not span-level: two findings in the same process during the same scrape window share the same coefficient. See Limitations for the full discussion.
[green.kepler] (optional, opt-in)
Opt-in integration with Kepler (CNCF sandbox) for per-container or per-process energy measurement via eBPF. Unlike Scaphandre, Kepler works on ARM64 (Graviton, Ampere, Apple Silicon, Cobalt 100) with degraded precision but a real signal. When configured, the watch daemon scrapes Kepler's Prometheus /metrics endpoint, computes a per-service joules delta vs the previous scrape, and publishes a measured per-op coefficient tagged kepler_ebpf.
| Field | Type | Default | Description |
|---|---|---|---|
endpoint | string | (none) | Full URL of the Kepler Prometheus /metrics endpoint. Required when the section is present |
scrape_interval_secs | int | 5 | How often to scrape, in seconds. Valid range: 1-3600 |
metric_kind | string | "container" | Which Kepler v2 counter to read: "container" (kepler_container_cpu_joules_total, keyed by container_name) or "process" (kepler_process_cpu_joules_total, keyed by comm) |
service_mappings | table | {} | Maps perf-sentinel service names to the Kepler label value identifying the same workload (container name for container, process command name for process) |
auth_header | string | (none) | Optional "Name: Value" header. Prefer PERF_SENTINEL_KEPLER_AUTH_HEADER env var |
[green.kepler]
endpoint = "http://kepler.kube-system.svc.cluster.local:9102/metrics"
scrape_interval_secs = 5
metric_kind = "container"
[green.kepler.service_mappings]
"order-svc" = "order-svc-deployment"
"chat-svc" = "chat"Ignored in analyze batch mode. Like Scaphandre, only watch spawns the scraper.
Precedence vs Scaphandre. Scaphandre RAPL outranks Kepler eBPF on x86_64 with RAPL access. The Kepler integration shines on ARM64 where Scaphandre is unavailable. See Limitations for the ARM eBPF accuracy caveats (Kepler upstream issue #1556).
Production deployment shape. Kepler typically runs as a Kubernetes DaemonSet, one pod per node. In a multi-node cluster the endpoint should point at an upstream Prometheus that scrapes the whole DaemonSet rather than a single pod, otherwise only one node's energy will be visible. Prometheus-mediated mode (PromQL queries) is reserved for a follow-up release.
[green.redfish] (optional, opt-in)
Opt-in integration with the Redfish BMC standard for bare-metal wall-plug power readings. Unlike Scaphandre and Kepler (which measure CPU + DRAM only), Redfish reads the actual power supply output via the BMC, so periphery (NIC, drives, fans, PSU overhead) is included. Bare-metal only, no cloud VMs.
| Field | Type | Default | Description |
|---|---|---|---|
endpoints | table | (empty) | Map of chassis_id → endpoint table with url + schema. Required to activate the scraper |
scrape_interval_secs | int | 60 | How often to scrape each chassis. Valid range: 15-3600 (BMC rate-limit defense, several BMCs throttle below 30s) |
service_mappings | table | {} | Maps perf-sentinel service names to the chassis hosting them. Every service mapped to the same chassis receives the same chassis-level coefficient |
ca_bundle_path | string | (none) | Reserved for a follow-up. Setting this field today causes the scraper to refuse to start with a clear error. Self-signed BMC certs are not supported in this release |
auth_header | string | (none) | Curl-style Basic auth header. Prefer PERF_SENTINEL_REDFISH_AUTH_HEADER env var. Session-token auth (POST /SessionService/Sessions) is not yet supported |
Each endpoint table has two fields: url (string, full Redfish URL including path) and schema (string, either "legacy_power" or "environment_metrics"). The schema selects the canonical JSON pointer the parser uses, no operator-typed pointer involved:
schema | Path served by BMC | JSON pointer parser reads |
|---|---|---|
legacy_power | /redfish/v1/Chassis/{id}/Power | /PowerControl/0/PowerConsumedWatts |
environment_metrics | /redfish/v1/Chassis/{id}/EnvironmentMetrics | /PowerWatts/Reading |
[green.redfish]
scrape_interval_secs = 60
[green.redfish.endpoints."chassis-legacy-1"]
url = "https://bmc-rack-01.dc.example/redfish/v1/Chassis/1/Power"
schema = "legacy_power"
[green.redfish.endpoints."chassis-modern-1"]
url = "https://bmc-rack-02.dc.example/redfish/v1/Chassis/1/EnvironmentMetrics"
schema = "environment_metrics"
[green.redfish.service_mappings]
"order-svc" = "chassis-legacy-1"
"chat-svc" = "chassis-legacy-1"
"ledger-svc" = "chassis-modern-1"Which schema to choose. /Power (legacy_power) was deprecated by DMTF Release 2020.4 but is still mandatory on BMC firmware as of 2026, every shipping vendor exposes it. /EnvironmentMetrics (environment_metrics) is the modern replacement that carries PowerWatts.Reading directly, present alongside /Power during the transition. Pick legacy_power unless your BMC documentation explicitly recommends EnvironmentMetrics. A mixed fleet is declared by giving each chassis the schema its firmware serves.
Ignored in analyze batch mode. Like Scaphandre and Kepler, only watch integrates Redfish.
Node-level coefficient. Every service mapped to the same chassis receives the same coefficient. Two services on one chassis will never get distinct measured per-op values via Redfish. See Limitations for the full discussion of this trade-off and the vendor-specific JSON response variance.
[green.cloud] (optional, opt-in)
Cloud-native energy estimation via CPU utilization + SPECpower interpolation. When configured, the watch daemon scrapes CPU% from a Prometheus/VictoriaMetrics endpoint and uses an embedded lookup table (idle/max watts per cloud instance type) to estimate per-service energy consumption. Supports AWS, GCP, Azure and on-premise hardware with manual watts override.
| Field | Type | Default | Description |
|---|---|---|---|
prometheus_endpoint | string | (none) | Prometheus HTTP API base URL (e.g. http://prometheus:9090 or https://prometheus:9090). TLS supported via hyper-rustls. Required. |
scrape_interval_secs | integer | 15 | Polling interval in seconds (range: 1-3600). |
default_provider | string | (none) | Default cloud provider: "aws", "gcp", "azure". |
default_instance_type | string | (none) | Fallback instance type for unmapped services. |
cpu_metric | string | (none) | Default PromQL metric/query for CPU utilization. |
Per-service entries in [green.cloud.services] support two forms:
Cloud instance (table lookup):
[green.cloud]
prometheus_endpoint = "http://prometheus:9090"
scrape_interval_secs = 15
default_provider = "aws"
[green.cloud.services]
"account-svc" = { provider = "aws", instance_type = "m7i.4xlarge" } # Sapphire Rapids
"api-asia" = { provider = "gcp", instance_type = "c4d-standard-8" } # AMD Turin
"analytics" = { provider = "azure", instance_type = "Standard_D8s_v6" } # Emerald Rapids
"ml-bench" = { provider = "aws", instance_type = "m8g.4xlarge" } # Graviton 4Modern instance families covered include AWS m7i/c7i/r7i, m7a/c7a, m6a/c6a, m7g/c7g, m8g/c8g; GCP c3, c3d, c4, c4d, n2d, t2a; Azure Standard_Dv6, Standard_Dadsv6, Standard_Dpsv6 (Cobalt 100), Standard_Ev6. One CPU-named bare-metal entry covers Sierra Forest (xeon-6780e, system-level watts assuming full chip ownership).
Manual watts (on-premise or custom hardware):
[green.cloud.services]
"my-service" = { idle_watts = 45, max_watts = 120 }Ignored in analyze batch mode. Only the watch daemon spawns the Prometheus scraper.
Fallback behaviour. If the Prometheus endpoint is unreachable, the daemon falls back to the proxy model for all cloud-configured services. Unknown instance types fall back to a provider-level default.
Precision bounds. The SPECpower interpolation model has approximately +/-30% accuracy, better than the proxy model but less precise than Scaphandre RAPL. See Limitations for details.
[green.electricity_maps] (optional, opt-in)
Real-time carbon intensity from the Electricity Maps API. Daemon-only.
| Field | Type | Default | Description |
|---|---|---|---|
api_key | string | none | API auth token. Prefer PERF_SENTINEL_EMAPS_TOKEN env var for security |
endpoint | string | https://api.electricitymaps.com/v4 | API base URL (http:// or https://). v3 still works but emits a deprecation warning at startup |
poll_interval_secs | integer | 300 | Poll interval in seconds (range: 60-86400). Free tier: use 3600+ |
emission_factor_type | string | lifecycle | Emission factor model. lifecycle (default) includes upstream emissions (manufacturing, transport). direct includes only combustion. Some Scope 2 frameworks prefer direct for stricter accountability |
temporal_granularity | string | hourly | API response aggregation. hourly (default), 5_minutes, or 15_minutes. Sub-hour values require a paid plan that exposes them, otherwise the API silently coarsens to hourly |
The region_map sub-table maps cloud regions to Electricity Maps zone codes:
[green.electricity_maps]
# Use PERF_SENTINEL_EMAPS_TOKEN env var instead of api_key in config
poll_interval_secs = 300
[green.electricity_maps.region_map]
"eu-west-3" = "FR"
"us-east-1" = "US-NY"
"ap-northeast-1" = "JP-TK"Staleness: if the last successful poll is older than 3x poll_interval_secs, the scraper falls back to embedded hourly profiles.
Rate limits: the Electricity Maps free tier allows approximately 30 requests per month per zone. For free tier users, set poll_interval_secs = 3600 or higher. The default of 300s is intended for paid plans.
API version: the default endpoint targets v4 since perf-sentinel 0.5.11. v3 remains accepted (the response schema is identical on carbon-intensity/latest), but a deprecation warning is logged once at daemon startup. To silence the warning, set endpoint = "https://api.electricitymaps.com/v4" explicitly. To keep v3 deliberately (for example to A/B-validate against v4), leave endpoint = "https://api.electricitymaps.com/v3" and acknowledge the warning.
Unknown values for emission_factor_type and temporal_granularity: these two knobs use a fail-graceful parser. A typo or unsupported value (e.g. temporal_granularity = "5min" instead of "5_minutes") does not reject the config at load time. The value is sanitized, a tracing::warn! is emitted, and the daemon falls back to the default. Watch the daemon logs at startup if you suspect a typo, the warn line will name the offending field and value.
Visibility in reports (since perf-sentinel 0.5.12): the active scoring configuration (API version, emission factor type, temporal granularity) is surfaced in three places so Scope 2 reporters can audit which carbon model produced the numbers without reading the operator's TOML.
- The JSON report carries a
green_summary.scoring_configobject with the 3 fields. Omitted when[green.electricity_maps]is not configured (additive on pre-0.5.12 baselines). - The HTML dashboard renders a chip bandeau above the green-regions table. Default values (
v4,lifecycle,hourly) are neutral chips, opt-in values (direct,5_minutes,15_minutes) are accent chips, the legacyv3endpoint shows as a warning chip mirroring the deprecation warning. Native browser tooltips explain each value. - The terminal
print_green_summaryoutput prepends a one-linerCarbon scoring: Electricity Maps v4, lifecycle, hourlybefore the per-region breakdown.
The bandeau and the terminal line are hidden when [green.electricity_maps] is not configured.
[green] calibration_file (optional)
Path to a calibration TOML file generated by perf-sentinel calibrate. When present, per-service calibration factors are loaded at config time and multiply the proxy model energy per op. Does not affect Scaphandre or cloud SPECpower measured energy.
[green]
calibration_file = ".perf-sentinel-calibration.toml"perf-sentinel calibrate input size limits. Both inputs are capped to protect against unbounded memory use: the --traces file is capped at 1 GiB (the fixed batch cap since 0.8.7, same as analyze) and the --measured-energy CSV is capped at 64 MiB. Calibrate exits with a clear error if either file exceeds its limit. 64 MiB is generous for thousands of RAPL samples per minute, if you need more, file an issue describing the workload.
[tempo] (optional)
Configuration for the perf-sentinel tempo subcommand. The subcommand runs in batch mode (not daemon), fetches traces from a Grafana Tempo HTTP API and pipes them through the standard analysis pipeline. All values below can also be set via CLI flags (flags override config).
| Field | Type | Default | Description |
|---|---|---|---|
endpoint | string | none | Tempo HTTP API base URL (e.g. http://tempo:3200) |
max_traces | integer | 100 | Maximum traces to fetch in search mode |
[daemon]
Streaming mode (perf-sentinel watch) settings.
| Field | Type | Default | Description |
|---|---|---|---|
listen_address | string | "127.0.0.1" | IP address to bind for OTLP and metrics endpoints. Use 127.0.0.1 for local-only access. Warning: setting a non-loopback address exposes unauthenticated endpoints to the network, use a reverse proxy or network policy |
listen_port_http | integer | 4318 | Port for OTLP HTTP receiver and Prometheus /metrics endpoint (range: 1-65535) |
listen_port_grpc | integer | 4317 | Port for OTLP gRPC receiver (range: 1-65535) |
json_socket | string | "/tmp/perf-sentinel.sock" | Unix socket path for JSON event ingestion |
max_active_traces | integer | 10000 | Maximum number of traces held in memory. When exceeded, the oldest trace is evicted (LRU). Range: 1 to 1,000,000 |
trace_ttl_ms | integer | 30000 | Time-to-live for traces in milliseconds. Traces older than this are evicted and analyzed. Range: 100 to 3,600,000 |
sampling_rate | float | 1.0 | Fraction of traces to analyze (0.0 to 1.0). Set below 1.0 to reduce load in high-traffic environments |
max_events_per_trace | integer | 1000 | Maximum events stored per trace (ring buffer). Oldest events are dropped when exceeded. Range: 1 to 100,000 |
max_payload_size | integer | 16777216 | Maximum size in bytes for a single JSON payload (default: 16 MiB, raised from 1 MiB in 0.5.13 because a daemon snapshot from /api/export/report already exceeds 1 MiB on a modest cluster). Range: 1,024 to 104,857,600 (100 MB). The default sits at the upper inclusive boundary of the comfort zone by design. Since 0.8.7 this caps daemon network payloads only: batch subcommands (analyze, diff, report, explain, calibrate, pg-stat, bench) read local input files under a fixed 1 GiB cap instead |
environment | string | "staging" | Deployment environment label. Accepted values: "staging" (default, medium confidence) or "production" (high confidence). Stamps every finding with the corresponding confidence field for downstream tooling (perf-lint planned). Case-insensitive; any other value is rejected at config load |
tls_cert_path | string | (absent) | Path to a PEM-encoded TLS certificate chain for the OTLP receivers. When set alongside tls_key_path, both gRPC and HTTP listeners use TLS. When absent, listeners use plain TCP. Each TLS listener caps concurrent in-flight handshakes at 128 (non-configurable) and drops peers that do not complete the handshake within 10 seconds |
tls_key_path | string | (absent) | Path to a PEM-encoded TLS private key. Must be set together with tls_cert_path (both or neither). On Unix, the daemon warns if the key file is readable by group or others |
api_enabled | boolean | true | Enable the daemon query API endpoints (/api/findings, /api/explain/{trace_id}, /api/correlations, /api/status). Set to false to disable the API while keeping OTLP ingestion and /metrics active |
max_retained_findings | integer | 10000 | Maximum number of recent findings retained in the daemon's ring buffer for the query API. Older findings are evicted when the limit is reached. Range: 0 to 10,000,000, where 0 disables the store entirely and reclaims its memory (recommended when api_enabled = false) |
ingest_queue_capacity | integer | 1024 | Capacity of the ingestion channel: span-event batches buffered between the listeners and the event loop. Once full, ingestion applies backpressure to producers. Raise it to absorb burstier traffic at the cost of memory. Range: 1 to 1,048,576 |
analysis_queue_capacity | integer | 1024 | Capacity of the analysis worker queue: evicted and expired batches awaiting detect+score. Once full, whole batches are shed and counted on perf_sentinel_analysis_shed_batches_total. Raise it to tolerate longer analysis bursts before shedding. Range: 1 to 1,048,576 |
Comfort zones and startup warnings
Daemon limits accept any value inside their hard bounds (rejected at config load), but perf-sentinel watch emits a one-shot WARN log at startup when a value falls outside the recommended comfort zone. The warning is informational: the daemon still runs. Use it as a sanity check that an unusual value was deliberate.
| Field | Comfort zone | Why values outside the zone are unusual |
|---|---|---|
max_payload_size | 256 KiB to 16 MiB | Smaller may reject legitimate OTLP batches; larger increases ingest latency and RSS |
max_active_traces | 1,000 to 100,000 | Smaller triggers aggressive LRU eviction; larger grows memory roughly linearly |
max_events_per_trace | 100 to 10,000 | Smaller truncates complex traces; larger rarely improves detection quality |
max_retained_findings | 100 to 100,000 (or 0) | Smaller evicts findings before /api/findings can serve them; larger holds a backlog. 0 disables the store and is silent |
trace_ttl_ms | 1,000 to 600,000 | Below 1s flushes traces before slow spans land; above 10min keeps near-dead traces |
max_fanout | 5 to 1,000 | Smaller floods the findings store with noise; larger suppresses most fanout detections |
Comfort zones judge the static value at startup. At runtime the daemon complements them with a settings advisor: when lifetime counters show a knob undersized for the observed load (queue sheds, ingest rejects, near-full trace window...), /api/export/report emits tuning entries in Report.warning_details naming the knob, its current value, and the suggested adjustment. See Metrics section "Warning kinds: transient vs sticky" for the rule table.
[daemon.correlation] (optional)
Cross-trace temporal correlation in daemon mode. When enabled, the daemon detects recurring co-occurrences between findings from different services or traces (e.g. "every time the N+1 in order-svc fires, pool saturation appears in payment-svc within 2 seconds").
| Field | Type | Default | Description |
|---|---|---|---|
enabled | boolean | false | Enable cross-trace correlation. Requires watch daemon mode with sustained traffic to produce useful results |
window_minutes | integer | 10 | Rolling window in minutes over which co-occurrences are tracked |
lag_threshold_ms | integer | 2000 | Maximum time lag in milliseconds between two findings to consider them co-occurring |
min_co_occurrences | integer | 3 | Minimum number of co-occurrences before a correlation is reported |
min_confidence | float | 0.5 | Minimum confidence score (0.0 to 1.0) to report a correlation. Computed as co_occurrence_count / total_occurrences_of_A |
max_tracked_pairs | integer | 1000 | Maximum number of finding pairs tracked simultaneously. Prevents unbounded memory growth from high-cardinality findings |
[daemon.correlation]
enabled = true
window_minutes = 10
lag_threshold_ms = 2000
min_co_occurrences = 3
min_confidence = 0.5Correlations are exposed via GET /api/correlations (when api_enabled = true) and emitted as NDJSON on the daemon's stdout stream.
[daemon.ack] (optional, since 0.5.20)
Daemon-side runtime ack store. Complements the CI TOML acks (see Acknowledgments) with a JSONL append-only file mutated through the HTTP API endpoints POST / DELETE /api/findings/{signature}/ack.
| Field | Type | Default | Description |
|---|---|---|---|
enabled | boolean | true | Enable the daemon ack endpoints. When false, POST / DELETE / GET /api/acks return 503 Service Unavailable, and GET /api/findings skips the ack filter |
storage_path | string | <data_local_dir>/perf-sentinel/acks.jsonl | Override for the JSONL file location. Resolved at runtime via dirs::data_local_dir() (XDG on Linux, Library/Application Support on macOS) when absent. The daemon refuses to start if the default cannot be resolved and no override is set; do not fall back to /tmp because the file holds audit data that must survive a reboot |
api_key | string | (absent) | Optional secret. When set, POST and DELETE on /api/findings/{signature}/ack require the X-API-Key header to match (constant-time compared via subtle). GET /api/acks and GET /api/findings stay unauthenticated by design (loopback reads). Empty string is rejected at config load |
toml_path | string | ".perf-sentinel-acknowledgments.toml" (CWD-relative) | Override for the CI TOML acks file the daemon reads at startup. Set to an absolute path for systemd or container deployments where CWD is not the repo root |
[daemon.ack]
enabled = true
storage_path = "/var/lib/perf-sentinel/acks.jsonl"
# api_key = "<rotate-this>"
toml_path = "/etc/perf-sentinel/acknowledgments.toml"The JSONL file is replayed and atomically rewritten (via tmp + rename) at every daemon restart, so repeated ack / unack cycles cannot accumulate beyond their net active state. On Unix, the file is created with mode 0600 (owner read-write only).
[daemon.cors] (optional, since 0.5.23)
Cross-origin resource sharing for the daemon's /api/* query endpoints. Disabled by default (no Access-Control-Allow-Origin header is emitted, the loopback-only posture is preserved). Enable when a browser client needs to call the daemon, typically the HTML report in live mode (perf-sentinel report --daemon-url <URL>, see HTML report).
Scope: the CORS layer is wired only on the /api/* query API sub-router. The OTLP ingest path (/v1/traces), Prometheus exposition (/metrics), and liveness probe (/health) are NOT exposed cross-origin even under wildcard mode. Browser pages cannot post traces, scrape /metrics, or hit /health regardless of allowed_origins. This containment is intentional, browser clients have no legitimate use for those surfaces.
Read-endpoint exposure: every /api/* GET endpoint (/api/findings, /api/acks, /api/status, /api/correlations, /api/explain/*, /api/export/report) is unauthenticated by design, in line with the loopback-only posture pre-0.5.23. Once you whitelist an origin, any browser tab on that origin can read every finding signature, ack metadata, and trace export the daemon holds. Only whitelist origins you trust to view all daemon-resident data. Mixing untrusted origins with wildcard mode (["*", "https://x"]) is rejected at config load.
| Field | Type | Default | Description |
|---|---|---|---|
allowed_origins | array<string> | [] | List of origins permitted to call the daemon's /api/* surface. ["*"] is wildcard mode (development only, no credentials). A non-wildcard list whitelists exact origins. Each non-wildcard entry must be a full origin (scheme + host + optional port), no trailing slash |
Wildcard example (development):
[daemon.cors]
allowed_origins = ["*"]Production example (whitelist):
[daemon.cors]
allowed_origins = [
"https://reports.example.com",
"https://gitlab.example.com",
]Methods allowed: GET, POST, DELETE, OPTIONS. Headers allowed: Content-Type, X-API-Key. (X-User-Id is not advertised because the daemon does not enforce it server-side; the by field on an ack POST body is operator-attested only.) Preflight Access-Control-Max-Age: 120 seconds. Long enough to amortize the OPTIONS roundtrip across a typical interaction, short enough that a tightened whitelist takes effect on the next browser preflight without a forced refresh.
The CORS layer does not set Access-Control-Allow-Credentials: true, which is incompatible with ["*"] and unnecessary because the daemon auths via the X-API-Key header rather than cookies. Browsers running on a non-whitelisted origin receive responses without the Access-Control-Allow-Origin header and the request is blocked client-side without a daemon-side rejection.
Origins that fail to parse as a valid HTTP header value (typically a copy-paste with embedded control characters) are dropped at startup with a warn! log and the rest of the list is honored. If every entry is invalid, the layer is disabled entirely. If daemon_api_enabled = false, the CORS layer is skipped (the /api/* sub-router is not mounted in the first place) and a warn! notes the unused config.
Since 0.5.27, combining allowed_origins = ["*"] with [daemon.ack] api_key also emits a startup warn!. Wildcard CORS plus an X-API-Key auth lets any browser origin replay a captured key through the daemon, even though no cookie or Allow-Credentials mode is in play. Whitelist explicit origins for production deployments where the API key is set.
Minimal configuration
An empty file or no file at all uses all defaults. A minimal configuration for CI might only set thresholds:
[thresholds]
n_plus_one_sql_critical_max = 0
io_waste_ratio_max = 0.25Full configuration example
[thresholds]
n_plus_one_sql_critical_max = 0
n_plus_one_http_warning_max = 3
io_waste_ratio_max = 0.30
[detection]
n_plus_one_min_occurrences = 5
window_duration_ms = 500
slow_query_threshold_ms = 500
slow_query_min_occurrences = 3
max_fanout = 20
chatty_service_min_calls = 15
pool_saturation_concurrent_threshold = 10
serialized_min_sequential = 3
[green]
enabled = true
default_region = "eu-west-3"
[daemon]
listen_address = "127.0.0.1"
listen_port_http = 4318
listen_port_grpc = 4317
json_socket = "/tmp/perf-sentinel.sock"
max_active_traces = 10000
trace_ttl_ms = 30000
sampling_rate = 1.0
max_events_per_trace = 1000
max_payload_size = 16777216
# Optional: enable TLS on both gRPC and HTTP listeners.
# Both fields must be set together (or both absent for plain TCP).
# tls_cert_path = "/etc/tls/server-cert.pem"
# tls_key_path = "/etc/tls/server-key.pem"
api_enabled = true
max_retained_findings = 10000
# Optional: tune the bounded queues (defaults shown). Raise under bursty
# load to reduce ingestion backpressure / analysis shedding.
ingest_queue_capacity = 1024
analysis_queue_capacity = 1024
# Optional: cross-trace correlation (daemon mode only)
# [daemon.correlation]
# enabled = true
# window_minutes = 10
# lag_threshold_ms = 2000Migration from 0.5.x
Eight legacy top-level keys were deprecated in 0.5.26 and removed in 0.6.0. A 0.5.x config that still uses any of them now fails at load time with a migration message rather than silently falling back to the default. Update to the sectioned form below before upgrading.
| Removed (top-level) | Use instead | Section |
|---|---|---|
n_plus_one_threshold | n_plus_one_min_occurrences | [detection] |
window_duration_ms | window_duration_ms | [detection] |
listen_addr | listen_address | [daemon] |
listen_port | listen_port_http | [daemon] |
max_active_traces | max_active_traces | [daemon] |
trace_ttl_ms | trace_ttl_ms | [daemon] |
max_events_per_trace | max_events_per_trace | [daemon] |
max_payload_size | max_payload_size | [daemon] |
Migration example. Before (0.5.x):
n_plus_one_threshold = 5
listen_port = 4318
max_payload_size = 2097152After (0.6.0+):
[detection]
n_plus_one_min_occurrences = 5
[daemon]
listen_port_http = 4318
max_payload_size = 2097152Loading a 0.5.x file on 0.6.0 returns a ConfigError::Validation whose message names both the removed key and its replacement, so a single tail of the error stream tells you exactly what to edit.
Environment variables
Configuration files must never contain secrets. For sensitive values (API keys, tokens), use environment variables in your deployment tooling. perf-sentinel itself does not read environment variables for configuration.
Acknowledgments file
.perf-sentinel-acknowledgments.toml is a separate file from .perf-sentinel.toml. It lives at the root of the application repo and lists findings the team has accepted as known. Acknowledged findings are filtered from the CLI output (analyze, report, inspect, diff) and excluded from the quality gate.
Loading rules:
- The default path is
./.perf-sentinel-acknowledgments.tomlin the current working directory. Override with--acknowledgments <path>. - If the file does not exist, the run is a no-op (no error, no output noise).
--no-acknowledgmentsskips the file entirely (audit view).- A typo in
signature, a missing required field, or a malformedexpires_atfails the run loud rather than silently widening the matched set.
Minimal entry:
[[acknowledged]]
signature = "redundant_sql:order-service:POST__api_orders:cafebabecafebabecafebabecafebabe"
acknowledged_by = "alice@example.com"
acknowledged_at = "2026-05-02"
reason = "Cache invalidation pattern, intentional. See ADR-0042."The expires_at = "YYYY-MM-DD" field is optional. Omitting it makes the ack permanent. Setting it lets you require a periodic re-evaluation: when the date passes, the ack stops applying and the finding reappears in the next CI run.
There is no glob or wildcard support, each entry is matched against an exact signature. Signatures are emitted on every finding in the JSON output, copy-paste them into the file rather than recomputing the SHA-256 prefix by hand.
For the full workflow and FAQ, see Acknowledgments.