Built like infrastructure, not like a demo.
If you're the CTO, VP of Engineering, or senior architect being asked whether WorkReef is safe to bring inside your stack, this page is for you. The short version: cloud-native, container-deployed, structurally isolated per customer, durable under crashes, gated by hard CI on every change. The long version is below: what your team will inherit, what we publish so they can audit, and where the gaps are.
Boring tech, where boring is the right answer.
We deliberately picked the lowest-novelty stack that solves the problem. Your team won't be on-call for an exotic database. They won't be debugging a hand-rolled job queue. The interesting work is in the agent + governance layer, not in re-inventing infrastructure.
Server
Python web service rendered server-side. Standard async HTTP framework. No client-side single-page app. The UI is HTML with progressive enhancement, so a browser can read pages straight off the wire and your security tooling can inspect them.
Database
Postgres for both control-plane and tenant data, with vector indexes for embeddings. Reversible schema migrations, both ways. Foreign-key and uniqueness constraints enforced at the database level on every boundary that matters.
Queue
Background jobs live in Postgres, claimed atomically. Workers heartbeat their leases. If a worker dies mid-job, the lease expires and another worker picks up the work. No separate message broker to operate.
Cache + state
Redis for sessions, rate limits, and ephemeral state. TLS required in production. Sessions use signed cookies with secure flags; auth is server-validated on every request.
Tenancy
Each customer gets their own database. The request router resolves which database engine to use before the handler runs, so there is no app-level customer filter to forget. The cross-customer access path doesn't exist. Not because we enforce a policy. Because the query has nowhere to land.
Agents
Each agent is a row with a stable identifier, a system brief, a schedule, and a tool scope. When the schedule fires, the worker loads context, calls the inference provider, and persists the run with its prompt, tool calls, and output. Every run is append-only audit.
Customer isolation enforced by the database, not by a filter.
The common SaaS mistake is to share one database across customers and sprinkle a customer-id filter across every query, then trust code review to catch the misses. We rejected that pattern at design time. Each customer has their own database; the request router picks the right engine before the handler runs.
For you, this means three things. One: a misconfigured query in our code cannot leak across customers, because the query has no other customer to land on. Two: customer-managed encryption can be applied at the database level, not the row level. Three: tenant-deletion is one operation: drop the database. Your right-to-be-forgotten promise becomes a one-click operation with predictable semantics, not a sweep across a hundred tables.
Things that don't disappear when something goes wrong.
Every async unit of work runs through one durable queue. Jobs are claimed atomically so concurrent workers don't trip over each other. Each claim takes a time-bounded lease; if the worker holding the lease dies, the lease expires and the work is re-queued. Repeated dispatches with the same idempotency key collapse to one execution.
- External calls retry with exponential backoff. A flaky vendor API surfaces as an explicit result, not a crash that cascades.
- If an inference provider is down, the platform degrades gracefully. Heuristics fill in, the digest still renders, the operator still gets something.
- Per-customer monthly AI spend cap enforced pre-flight on every cost-incurring path. Once you hit it, further LLM calls return a clear error rather than silently overspending.
- Connecting an integration kicks off all downstream work automatically. Failures retry inside the queue rather than surfacing as broken UX.
- Encryption key rotation is a documented procedure. Idempotent, per-row failure-tolerant, with dry-run support against staging.
Every change runs the same gauntlet production runs.
Every pull request is checked by six gates before it can merge. None is "warning only." A failed gate means the change does not ship.
Style + obvious bugs
Fast linter catches unused imports, dead code, common bug patterns, formatting drift. The codebase stays scannable.
Type checking
Strict type-checking on a perimeter that ratchets outward every release. Legacy modules sit under per-module overrides that allow existing errors but block new ones in the same class. The ratchet only tightens.
Dependency vulnerability scan
A known-vulnerable transitive dependency fails the merge. Static analysis flags the bug classes a CI gate can catch.
Test suite + coverage floor
The full test suite runs against a real Postgres in CI. Coverage floor only goes up, never down. If you remove tests, you've raised the bar for what's left to cover.
Schema migration drift
Migrations are applied from scratch and then round-tripped (one step back, one step forward) on real Postgres in CI. Tables present in the database but not in the models are refused.
API surface snapshot
Any change to the public API requires regenerating the snapshot and committing the diff. API changes become deliberate, reviewable acts. Not refactor side-effects.
Real container boot
The full container image is built and booted in CI. The preflight checks must pass. Health endpoints must return non-error. The login page must actually render. Anything an in-process test can't catch, this catches.
Whenever "two places must stay in sync," we add a test.
The cheapest tests we write. They run on every change and they're the only thing that scales with team size.
Database enum alignmentEvery code-level enum that's stored in the database is checked against the actual database enum. Renames, additions, removals. All caught. The class of bug that crash-loops a scheduler when someone renames a value is now structurally impossible.Route smokeEvery public page returns a non-error. Catches template-render bugs, missing context variables, broken globals. Anything that would 500 in the user's browser.API snapshotThe public API surface is pinned by a committed snapshot. Drift requires regenerating it and reviewing the diff.Connector catalog driftIf a connector claims it pulls a capability (cost, usage, work items) it has to actually implement that capability. The catalog cannot lie about what data is flowing.Schema validationEvery input model in the API surface validates at startup. The class of "the validator can't even load" bugs is caught before the request lands.
The container refuses to serve traffic from a broken build.
Before the web service is allowed to accept its first request, a preflight script runs inside the container. It verifies that the database schema matches the models. It verifies that every enum the code expects is actually in the database. It verifies that the in-process health endpoints return non-error. It verifies that the login page actually renders.
If any check fails, the preflight sentinel is never written. The web, worker, and scheduler processes stay parked. Your load balancer sees a 503 from a healthy proxy in front of broken instances. A broken page never reaches a customer.
Want to dig deeper?
We're happy to walk through the architecture with serious prospects. Bring your senior engineer and your security counterpart on the same call. That's the conversation we like.