Security · For the reviewer

If you're the security reviewer evaluating WorkReef, start here.

This page is written for you. The CISO. The compliance officer. The senior security engineer your AI Change Leader handed the deal to. Every claim below is in our living security posture document, kept current as we ship. Is WorkReef enterprise-ready? The foundations are in place. Several operational practices and compliance attestations are still in front of us. We name them at the bottom of this page in the Deferred list. We do not hide what isn't done.

The six questions you walked in with

Mapped to where each one is answered.

1Where does the inference happen?
Per-tenant LLM routing. Azure OpenAI, Amazon Bedrock, on-prem inference, or a mix. Selected per customer without re-architecting. Model allowlist enforced server-side. Redaction policy applied before egress; tokens restored on the response. See Layer 3 below.
2Where do the encryption keys live?
Customer-managed. Bring your own key from Azure Key Vault, AWS KMS, or GCP KMS. Each customer's data lives in its own isolated database. No cross-customer access path to forget. See Layer 1.
3Can I verify what the AI did?
Hash-chained audit log. Every row's chain_hash = sha256(content || previous.chain_hash). Tampering with any row breaks every subsequent hash. Your team can export the table and re-verify offline on their own infrastructure. See Layer 4.
4What is the agent allowed to do?
Per-action approval gates. For any action class you classify as restricted, the platform refuses to fire it without a fresh, named approval. 24-hour default window. Approval requests and decisions both audit-logged. See Layer 5.
5What happens when it goes wrong?
Drive layer is conservative and reversible. Promotion gate refuses to advance a candidate forward without ≥30 shadow runs at ≥85% agreement. Backward phase moves (autonomous → assist → off) are explicitly allowed. Per-tenant AI spend caps gate every cost-incurring endpoint. See Recommendation soundness below.
6What's the attestation status?
SOC 2 Type II, HIPAA BAA, and ISO 27001 are roadmapped, not signed. For HIPAA-relevant deployments (our first customer is one) we work the BAA path as part of onboarding. See the Deferred list at the bottom. We don't hide what isn't done.

The enterprise architecture

Five layers. Each one drops independently.

Each layer is a concern a customer security team can drop on its own. When a CISO says "no third-party inference," layer 3 swaps the provider without re-architecting. When they say "we hold the key," layer 1's BYOK is the cryptographic shred.

01
Your data, isolated by storage, not by policy
- Each customer gets their own isolated database. Cross-customer access has no code path to it. Structurally impossible, not a permissions check we hope holds.
- Bring your own key. The platform supports Azure Key Vault, AWS KMS, and GCP KMS as the wrapping authority for credentials and sensitive secrets at rest.
- You control the encryption boundary. If you rotate or revoke the key, your data is unreadable to us.
02
Data minimization, enforced at the storage layer
- Observation summaries are capped at 240 characters by a hard storage-level limit, so a misbehaving extractor cannot persist raw email or chat content even if it tries.
- Stakeholder extraction is opt-in. A master switch plus per-source grants. What flows through inference is what you explicitly turned on.
- Team-level aggregations require k-anonymity of at least 3, so an individual cannot be back-traced from a metric they appear in.
- Persona-based access scoping pre-filters every query at the server. Operators never see data outside their scope, regardless of what the UI requests.
03
LLM governance: per-tenant routing and redaction
- Choose where inference happens per customer: Azure OpenAI, Amazon Bedrock, your on-prem endpoint, or our managed providers. The rest of the platform does not change when you pick a different backend.
- Model allowlist enforced on the server. If you constrain the platform to a specific set of models, that constraint is structural. No client-side bypass.
- Redaction policy applied before egress. Email, phone, SSN, EIN, and common-name patterns are tokenized before any prompt leaves your trust boundary. Tokens are restored on the response on the way back.
- Per-tenant redaction profiles (off, standard, aggressive, custom) let you tune the bar to your regulator.
04
Tamper-evident audit, verifiable offline
- Every audit entry is cryptographically linked to the one before it. Tampering with any past row breaks the chain at every subsequent entry.
- Your compliance team can export the audit log and re-verify the chain on their own infrastructure. The verification logic is plain and reviewable.
- Logged events cover every LLM call (prompt-token counts and redaction counts, never the prompt content), every action taken, every scope denial, every approval request and decision, every data-access event, every config change, every consent change.
05
Per-action approval gates
- For any action class you classify as restricted, the platform refuses to fire it without a fresh, named approval. Default expiry window is 24 hours.
- Approval requests and decisions are themselves audit-logged. Who approved what, when, and what fired afterward, all reviewable from one timeline.
- Gates are configured per tenant; you decide which actions need an approver vs which are allowed to run on their own.

Runtime posture

In place today.

Your data lives in its own database

Each customer's tenant data is in a separate database. No app-level filter to forget; no cross-customer access path to discover.

Sessions are signed and short-lived

14-day session TTL with secure cookie flags in production. Sessions stay locked to the host they were issued on.

Every write protected against CSRF

Cross-site request forgery tokens enforced on every state-changing request, with HTMX + plain-form support behind the same primitive. OAuth callbacks are exempted explicitly so the contract is reviewable.

Production refuses to boot misconfigured

Default passwords, short keys, plaintext URLs in production. All of these abort the process at startup rather than silently shipping insecure.

Background jobs that survive crashes

If a worker dies mid-job, its lease is reclaimed automatically and the work re-queued. No silent data loss when an instance restarts.

External calls retry without cascading

Three attempts with exponential backoff. A flaky vendor API does not bring down the surrounding sync; failures are recorded as the operation result, not as crashes.

Graceful degradation when an LLM is down

If your inference provider is unavailable, the digest still renders, opportunities still surface via heuristics, and the Architect still produces an analysis. AI outage is not platform outage.

Per-tenant AI spend cap

A monthly dollar cap per customer is enforced as a pre-flight on every cost-incurring endpoint. Once you hit it, further LLM calls return a clear error instead of silently overspending.

Tenant data export + deletion

Admin one-click flows. Export bundles every table from your tenant database as a zip (credentials redacted). Delete drops the tenant database and tombstones the control-plane record. True right-to-be-forgotten.

Encryption key rotation as a procedure

Rotating the platform encryption key is a documented operations procedure. Idempotent, per-row failure-tolerant, with dry-run support so you can verify against staging before running it on production.

The AI-platform-specific risk surface

Recommendation soundness.

The Architect drafts the analysis per task. A panel of frontier models (Claude, GPT, Gemini, configurable) then votes across four dimensions: cost/benefit, human sensitivity, customer risk, compliance. A deliberator pass synthesizes the verdict and names the agreement and disagreement explicitly. The recommendation gate refuses to surface do_now when the panel didn't form or didn't agree. It silently downgrades to pilot so the home page never shows an unvetted "AI-takeover ready today" call.

AI feasibility is grounded in real operational signal. The Architect's quorum reads measured recurrence, uniformity, and handler consistency from your live data sources: CloudWatch alarm history, log clustering, Jira issue clusters, GitHub Actions failure patterns. When two or more sources flag the same task as an AI candidate, the operator sees the cross-source confirmation at proposal review time. Compliance is a hard veto. A PHI task stays human-only at perfect recurrence numbers.

The drive layer is conservative and reversible. The promotion gate only proposes advancing a candidate when it clears all three thresholds: thirty shadow runs, eighty-five-percent agreement with the human baseline, fifty-percent of those runs scored against ground truth. Backward moves (autonomous to assist to off) are always allowed. Mock-provisioning at handoff lands a placeholder agent with an intentionally narrow tool scope (no Teams, no calendar, no destructive surface) until the operator wires real credentials.

Partial

Where we are mid-stride.

Things that work today but need the next push. Each has an owner and a path to ✅.

Rate limiting

In place on every public surface: sign-in, password, signup, integration connect, agent run, report generation. Fails open if the limiter is unreachable so legitimate traffic is not punished.

Next: Extend to admin actions and add per-customer AI spend rate caps in addition to the monthly dollar cap.

API request validation

Typed input validation on every state-changing endpoint. Unknown fields are rejected, not silently ignored.

Next: Promote the remaining form-style handlers to the same typed validator so the contract is uniform across the whole API.

CI pipeline

Every change runs through linting, static type checks, a dependency vulnerability scan, the test suite with a coverage floor, an OpenAPI snapshot check, and a real container boot against Postgres and Redis before it can merge.

Next: Promote type-checking to required once the strict perimeter covers the whole codebase. Add static application security testing.

Backups

Point-in-time backups with default retention.

Next: Bump retention to 30 days for production and run a restore drill each quarter.

Deferred

Known gaps. Named in front.

These are intentionally not addressed yet. They're operational practices the deployment team takes on, or scope for follow-up engineering. For each new tenant we review this list against their requirements. If they need SOC 2 or HIPAA today, that's a sales conversation about timeline.

Browser-driven end-to-end testing (Playwright).
Load + chaos testing. Worker pool capacity at scale untested.
OAuth refresh-token rotation for short-lived tokens.
PII / PHI tagging at the field level.
Data residency. Single Azure region today.
Formal compliance attestations (SOC 2 Type II, HIPAA BAA, ISO 27001).
Customer-managed encryption keys (per-tenant KMS wrapping).
WAF in front of the public ingress.

CISO meeting on the calendar?

We can walk through the audit chain verifier, the customer-managed-key flow, the per-tenant inference routing, and the current security posture live. Bring questions.

Request access Technology overview