The three hard invariants

monsys.ai’s AI observability is deliberately narrower than most tools in this category. Three invariants summarise what we don’t do, regardless of customer request.

1. Passive, never autonomous

monsys.ai never runs prompts. It takes no actions. It blocks nothing inline. It is an observability layer — after-the-fact evidence, not a control plane.

No AI controlling other AI. Full stop.

In practice:

No “auto-fix” buttons that rewrite a prompt.
No “AI judge” that auto-classifies your traces (LLM-as-judge lives in Langfuse, not here).
No request-blocking proxy. Our SDK runs outside your request path and fails silently if the hub is unreachable.

Why: AI systems controlling other AI systems create accountability vacuums. Who was the operator? Who confirmed the action? The AI Act (art. 14) mandates human oversight. A passive layer keeps humans in the loop.

2. PII redacted at the source

EU-PII is detected before storage using checksum validation and replaced with a kind-tagged hash token. What we detect:

IBAN — 36 SEPA countries (ISO 13616 mod-97). The kind suffix is the ISO country code in lowercase: [IBAN_BE], [IBAN_NL], [IBAN_FR], [IBAN_DE], [IBAN_ES], [IBAN_IT], [IBAN_PT], [IBAN_LU], [IBAN_AT], … Zero false positives on random strings.
Rijksregister (BE) — mod-97 over the 9-digit base + 2 control.
BTW-BE / KBO — mod-97 over the 8 + 2 digits.
BSN (NL) — weighted-sum mod-11. BSNs starting with 0 are rejected (reserved in practice).
NIR (FR) — mod-97 over the 13-digit base. First digit ∈ {1,2,3,4}.
Phone (E.164) — leading +CC with 8–15 digits. Universal.
Email — RFC-conform regex.

Token format in pattern-redact mode: [KIND] or for IBAN [IBAN_<country>]. In hash-only mode a 12-char SHA256 prefix is appended: [IBAN_BE:43d1151bbe0b].

Raw PII never reaches the hub file system or database.

In practice:

Redaction runs in hub/api/ai/redaction.go, before the content blob is hashed and stored.
Mod-97 checksum validation means zero false positives on invalid numbers — a random 11-digit string won’t look like an RRN.
The hash token is consistent for the same PII within a trace, so you can still see in span content that “the same IBAN” repeated, without knowing what it was.

The tenant attribute redaction_level controls behaviour:

Level	What gets stored
`off`	Full content (sandbox only)
`hash-only`	Only SHA256 of the redacted content
`pattern-redact`	Content with PII replaced by tokens (default)
`full-content-strip`	Empty strings for prompt/completion

3. Ed25519-signed evidence packs

Every month (or every requested period) monsys.ai generates an Ed25519-signed tarball. The manifest contains:

pack_id, tenant_id, app_id, period_start, period_end
trace_count, span_count, blob_count
traces_sha256, spans_sha256, blob_index[] (hash per blob)
signing_public_hex (the Ed25519 public key)
Signature in manifest.sig over the bytes of manifest.json

Your auditor verifies offline, without a monsys account.

The script tools/evidence-pack-verify.py (no monsys dependencies, only cryptography) uses the public key embedded in the manifest itself. The auditor cross-verifies that key against what monsys publishes publicly (e.g. a security page, or a printed certificate we provide).

Tamper detection

If even one byte of a blob, traces.jsonl, or spans.jsonl is changed, the hash comparison fails and exit code = 1. The script tells you exactly which artifact is wrong.

What this is not

Not a prompt engineering tool. No playground, no versioned prompts, no A/B experiments. For that, use Langfuse or similar — and stream its output into monsys for the audit portion.

Not an eval framework. No LLM-as-judge, no built-in datasets. You can inject eval results as span attributes.

Not a blocking guardrail layer. For real-time policy enforcement there are specialised tools — we run async/parallel next to your inference path.