The three hard invariants
monsys.ai’s AI observability is deliberately narrower than most tools in this category. Three invariants summarise what we don’t do, regardless of customer request.
1. Passive, never autonomous
monsys.ai never runs prompts. It takes no actions. It blocks nothing inline. It is an observability layer — after-the-fact evidence, not a control plane.
No AI controlling other AI. Full stop.
In practice:
- No “auto-fix” buttons that rewrite a prompt.
- No “AI judge” that auto-classifies your traces (LLM-as-judge lives in Langfuse, not here).
- No request-blocking proxy. Our SDK runs outside your request path and fails silently if the hub is unreachable.
Why: AI systems controlling other AI systems create accountability vacuums. Who was the operator? Who confirmed the action? The AI Act (art. 14) mandates human oversight. A passive layer keeps humans in the loop.
2. PII redacted at the source
EU-PII is detected before storage using checksum validation and replaced with a kind-tagged hash token. What we detect:
- IBAN — 36 SEPA countries (ISO 13616 mod-97). The kind suffix is
the ISO country code in lowercase:
[IBAN_BE],[IBAN_NL],[IBAN_FR],[IBAN_DE],[IBAN_ES],[IBAN_IT],[IBAN_PT],[IBAN_LU],[IBAN_AT], … Zero false positives on random strings. - Rijksregister (BE) — mod-97 over the 9-digit base + 2 control.
- BTW-BE / KBO — mod-97 over the 8 + 2 digits.
- BSN (NL) — weighted-sum mod-11. BSNs starting with 0 are rejected (reserved in practice).
- NIR (FR) — mod-97 over the 13-digit base. First digit ∈ {1,2,3,4}.
- Phone (E.164) — leading
+CCwith 8–15 digits. Universal. - Email — RFC-conform regex.
Token format in pattern-redact mode: [KIND] or for IBAN
[IBAN_<country>]. In hash-only mode a 12-char SHA256 prefix is
appended: [IBAN_BE:43d1151bbe0b].
Raw PII never reaches the hub file system or database.
In practice:
- Redaction runs in
hub/api/ai/redaction.go, before the content blob is hashed and stored. - Mod-97 checksum validation means zero false positives on invalid numbers — a random 11-digit string won’t look like an RRN.
- The hash token is consistent for the same PII within a trace, so you can still see in span content that “the same IBAN” repeated, without knowing what it was.
The tenant attribute redaction_level controls behaviour:
| Level | What gets stored |
|---|---|
off | Full content (sandbox only) |
hash-only | Only SHA256 of the redacted content |
pattern-redact | Content with PII replaced by tokens (default) |
full-content-strip | Empty strings for prompt/completion |
3. Ed25519-signed evidence packs
Every month (or every requested period) monsys.ai generates an Ed25519-signed tarball. The manifest contains:
pack_id,tenant_id,app_id,period_start,period_endtrace_count,span_count,blob_counttraces_sha256,spans_sha256,blob_index[](hash per blob)signing_public_hex(the Ed25519 public key)- Signature in
manifest.sigover the bytes ofmanifest.json
Your auditor verifies offline, without a monsys account.
The script tools/evidence-pack-verify.py (no monsys dependencies,
only cryptography) uses the public key embedded in the manifest
itself. The auditor cross-verifies that key against what monsys
publishes publicly (e.g. a security page, or a printed certificate
we provide).
Tamper detection
If even one byte of a blob, traces.jsonl, or spans.jsonl is changed, the hash comparison fails and exit code = 1. The script tells you exactly which artifact is wrong.
What this is not
Not a prompt engineering tool. No playground, no versioned prompts, no A/B experiments. For that, use Langfuse or similar — and stream its output into monsys for the audit portion.
Not an eval framework. No LLM-as-judge, no built-in datasets. You can inject eval results as span attributes.
Not a blocking guardrail layer. For real-time policy enforcement there are specialised tools — we run async/parallel next to your inference path.