mTLS — hub ↔ agent client certificates
Every agent gets a hub-signed X.509 client certificate at first
boot. From that moment on, every TLS handshake to api.monsys.ai
proves two things: that the connection has a valid bearer token (as
before), and that the client side holds the corresponding private key
in /var/lib/monsys/mtls/client.key. A leaked token alone is no longer
enough to impersonate an agent — the attacker also needs the on-disk
key, which never leaves the host.
Threat model
Before mTLS (bearer-only):
- Token theft from an inventory dump, log line, or memory scrape → an attacker can call every agent endpoint as that agent until the operator rotates the token.
After mTLS:
- Token theft still requires
/var/lib/monsys/mtls/client.key(root- owned, mode 600) to be exfiltrated separately. - A misconfigured proxy that strips the cert is rejected — Caddy sets
the
X-Monsys-Client-Verifiedheader from the cert fingerprint, and hub middleware aborts if the CN doesn’t match the bearer’s agent_id.
How it works
- CA bootstrap (one-time). On first start the hub generates an
RSA-4096 CA, encrypts the private key with
CLOUD_ENCRYPTION_KEY(AES-256-GCM), and stores both in the singletonhub_settingsrow. The public CA cert is also exposed unauthenticated atGET /api/v1/agents/ca-certso agents can pin it. - Per-agent cert issue. The agent calls
POST /api/v1/agents/issue-client-certwith its bearer token. The hub signs an RSA-2048 cert (CN = agent_id UUID, OU = tenant_id UUID), 365-day validity, stores the public cert inagent_certificates, returns cert + key + CA PEM once. - Persistence. The agent writes three files mode 600 under
/var/lib/monsys/mtls/:client.crt,client.key,hub-ca.crt. - Every subsequent request uses the cert. Caddy is configured
with
client_auth.mode = verify_if_givenagainst the hub CA, so older bearer-only agents keep working during rollout. - Caddy propagation. After verification, Caddy injects
X-Monsys-Client-Subject(full DN) andX-Monsys-Client-Verified(cert fingerprint) into the upstream request. Inbound copies of these headers are stripped first so a non-mTLS client can’t forge them. - Hub cross-check. The
AgentAuthmiddleware extracts the CN from the subject DN and compares it to the bearer-resolved agent_id. Match: success +last_seen_atbumped on the cert row. Mismatch: HTTP 401 +integrity_anomalyrecorded — strong signal of token theft or proxy misconfiguration.
Rollout — what changes for existing agents
Nothing immediate. Caddy’s verify_if_given mode allows non-mTLS
connections to keep working. On the next agent auto-update (or
manual restart), the new binary calls issue-client-cert once and
all subsequent traffic is mTLS-authenticated. No downtime, no
re-enrollment, no token change.
The Trust Score agent_health component soft-penalises
(-5 points) agents that have not yet bootstrapped a cert. This
nudges operators to roll out the new binary without forcing a hard
break.
Operational notes
| Topic | Detail |
|---|---|
| CA expiry | 10 years from first boot. Alert wired into ca_not_after for future automation. |
| Client cert expiry | 365 days. The agent re-fetches automatically when within 30 days of expiry. |
| CA private key | AES-256-GCM-encrypted in hub_settings.ca_key_enc. Restore requires the same CLOUD_ENCRYPTION_KEY — back it up out-of-band. |
| Rotation | POST /api/v1/agents/issue-client-cert rotates: old row marked revoked_at = NOW(), revoke_reason = 'rotated', new row inserted in one tx. |
| Revocation | Operator marks revoked_at in the DB. Hub middleware rejects revoked certs even though Caddy still accepts them (until a future CRL endpoint). |
| Storage | RSA-2048 client keys, RSA-4096 CA. Ed25519 reserved for Emergency Action Tokens where we control both sides. |
Endpoints
| Method | Path | Auth | What |
|---|---|---|---|
| GET | /api/v1/agents/ca-cert | none | Hub CA public certificate (for trust pinning) |
| POST | /api/v1/agents/issue-client-cert | bearer | Issue or rotate the calling agent’s client cert + private key |
Files on the agent host
/var/lib/monsys/mtls/├── client.crt # rsa-2048 client certificate (PEM)├── client.key # rsa-2048 private key (PEM, mode 600)└── hub-ca.crt # hub CA root (PEM)Compliance mapping: ISO 27001 A.8.20 (Network security management) +
CRA Annex I §3 (secure-by-default communication). Both controls are
auto-evaluated by counting active, non-revoked rows in
agent_certificates.