Skip to content

Ed25519 signing-key rotation

Emergency-action tokens (playbook runs, isolation commands, agent self-update wrappers) are Ed25519-signed by the hub. The agent verifies against a pinned public key. For zero-downtime rotation:

The problem

Naive rotation = new key + all agents immediately verify against the new key → in-flight tokens signed with the old key suddenly become invalid. On compromise you WANT that (instant revocation), but for a planned rotation you want grace.

How it works

The hub keeps a set of hub_signing_keys per tenant. Every key has is_active=true, optionally expires_at. On rotation:

  1. Generate a new Ed25519 keypair
  2. Public part INSERT into hub_signing_keys with is_active=true, expires_at=NULL
  3. Existing active keys get expires_at = NOW() + grace_days × INTERVAL '1 day'
  4. The private key is shown ONCE in the response, then never again

The agent periodically fetches GET /api/v1/agents/:id/signing-keys/active (only non-expired actives). During grace the agent therefore holds BOTH keys in its trust set; tokens signed with either the old or the new key validate.

After expires_at of the old key, it falls out of the trust set automatically.

How you do it in the UI

/settingsSigning keys tab:

  1. Give a reason (“annual rotation”, “suspected compromise”, …)
  2. Pick grace days (default 7, max 90)
  3. Click Rotate now
  4. Copy the shown private_hex immediately — not stored, shown once
  5. Paste it into the hub deployment config (env var MONSYS_EMERGENCY_PRIVATE_KEY_HEX or secrets manager)
  6. Restart the hub. From that moment the hub signs with the new key; old tokens stay valid for grace_days.

The table below the rotate button shows all keys: ACTIVE (no expiry), EXPIRES <date> (in grace), or RETIRED.

Compromise scenario

On suspected compromise:

  1. Rotate with grace_days=0 (shortens grace to ~now)
  2. Paste the new private key into the deploy
  3. Restart the hub
  4. All old tokens become invalid immediately

This triggers: in-flight playbook runs not yet received by the agent can fail. So use ONLY for REAL compromise — not for planned maintenance.

API

GET /api/v1/signing-keys (admin only)
POST /api/v1/signing-keys/rotate (admin, rate-limit 5/h)
GET /api/v1/agents/:id/signing-keys/active (agent-auth)

Body POST /rotate:

{
"reason": "annual rotation 2026",
"grace_days": 7
}

Response (ONE TIME):

{
"id": "uuid",
"public_hex": "abc…64",
"private_hex": "def…128",
"expires_grace_days": 7,
"expires_at": "2026-05-17T...Z",
"warning": "Save the private key now — it is shown only this once."
}

Audit

Every rotation logs to audit_log with resource_type='signing_key', resource_id=<new_key_id>, IP, user, reason. Reviewable via /audit?resource_type=signing_key.