Skip to content

Data model — what we keep and don't

copilot_seats — active seats

Each pull cycle replaces the entire snapshot for that org.

FieldSourceHashed?
user_login_hashSHA256(lowercase login)
user_login_prefixFirst 4 chars + ’…‘partial
plan_type”business” / “enterprise”no
assignee_teamName of the team that assigned the seatno
last_activity_atRFC3339 timestampno
last_activity_editor”VSCode” / “JetBrains” / …no
pending_cancellation_dateDATE or NULLno
snapshot_atWhen we pulledno

copilot_events — audit-log entries

Append-only. Idempotent on (connection_id, github_event_id).

FieldSourceHashed?
github_event_idGitHub’s _document_idno
event_typeaction fieldno
actor_login_hashSHA256(lowercase actor.login)
actor_login_prefixFirst 4 chars + ’…‘partial
target_login_hashSHA256(lowercase user.login)
target_login_prefixFirst 4 chars + ’…‘partial
occurred_at@timestamp fieldno
payload_jsonRest of GitHub’s payload, with PII filteredpartial

Filtered payload fields

These fields from GitHub audit-log entries are dropped before storage (see isPIIField in hub/api/handlers/copilot_worker.go):

  • actor, user, actor_login, user_login — already stored as hashes
  • actor_id, user_id — internal GitHub user IDs
  • actor_email, user_email, emails
  • name, full_name — display names

Everything else (action codes, org metadata, repository names, business names) goes into payload_json unchanged.

What we DO NOT have

  • Full username, email, display name — dropped at ingest
  • Code suggestions, prompts, completions — Copilot keeps those, we have no access
  • IP addresses — not in GitHub’s admin API
  • Files a dev had open — private to the dev
  • Per-dev usage frequency/duration — only last_activity_at; GitHub doesn’t expose granular timing

How to map a hash back to a person

The user_login_prefix shows jan… for user jan.peeters. Match this prefix against your HR system or GitHub org member list. We don’t have the mapping and don’t want it.