Insights — anomaly detection

/insights shows metrics that are statistically outside their normal distribution. It is not machine learning and we don’t claim it is anywhere — it’s a z-score over a 7-day baseline.

Method

For every (agent, metric) in {CPU%, memory%, network total bps}:

baseline = AVG(metric) ± STDDEV_POP(metric) over [NOW-7d, NOW-1d]
current  = AVG(metric) over [NOW-15min, NOW]
z = (current - baseline_mean) / baseline_stddev

A metric is flagged anomalous when |z| > 2.5. That maps to ~0.6% false-positive rate if the metric is normally distributed (in practice the FP rate is higher because metrics are rarely truly Gaussian — but it’s good enough to surface the real outliers).

Only agents with ≥60 baseline samples (~1 day at 15s metrics) qualify. Otherwise a newly registered agent would always look “abnormal”.

What you see in the UI

Four KPI cards at the top:

Anomalies (live) — count of current flags
Agents covered — agents online in the last hour
Metrics (24h) — total datapoints analysed
Method — “z-score, threshold |z| > 2.5”

Below the KPIs: table of top-50 anomalies, sorted by |z|. Columns: agent (clickable → agent detail), metric, current value, baseline mean, stddev, z-score, direction (↑ above / ↓ below).

What this does WELL

Sudden spikes where the system is 4× its normal usage
Sudden dips (“why is this webserver’s CPU suddenly at 1%?”)
Pattern breaks: ranges you didn’t have to program explicitly

What this does NOT do

Seasonal patterns — if your nightly batch pegs CPU at 80% every 02:00, that stays in the baseline and stops being flagged. Intentional.
Cross-metric correlations — a cross-metric ML would do that; this method doesn’t.
Prediction — no “this server will fall over in 3 hours”. For predictive growth see Capacity planning.

How to use it operationally

Check /insights once per shift as part of your status check
Click through a row to see the agent and what’s going on
If it’s an expected metric (“nightly batch”) → working as intended, ignore
If it’s an unexpected metric → investigate

Not an alert replacement

Insights is passive observation, not a replacement for alert rules. An alert rule fires push notifications + audit log + webhooks; Insights asks you to look proactively. For pages that should wake you at night: use /alerts → Rules with severity=critical.