← Back to Wingman
02 / Wingman / Architecture Architecture

The Runtime Defence
Architecture

Wingman pattern-matches every user request against known AI failure modes and generates targeted defences before output reaches the user. Pre-loaded from real corpora; continues learning at runtime.

The runtime defence architecture

A Wingman skill is not a static authored entity. It's a runtime-generated defence against a matched failure-mode pattern, with both the patterns and the defence-generation logic living in the contextual database.

The contextual database content

The contextual database holds two kinds of content:

Failure-mode patterns. Each pattern is a (signature, failure mode) record. The signature characterises what kinds of request, intent shape, or conversational context produce a particular failure. The failure mode names what goes wrong — overconfident reasoning, hallucinated specifics, sycophantic agreement, ungrounded claims on present-tense questions, argument loops, pattern-matching over investigation, permission-asking on asymmetric-cost decisions, others.

Example defences. For known patterns, the database holds defences that have proven effective — the (injection, filter) pairs that successfully prevented the failure mode in past use. These are reference material for defence generation rather than fixed templates.

Patterns and defences are indexed in the same intention-encoding space used by Acumen. Retrieval of relevant patterns for a current request happens through encoding similarity.

The runtime defence pipeline

When a user request arrives, Wingman performs three steps:

  • Pattern matchingThe request is encoded; matching failure-mode patterns are retrieved from the contextual database via encoding similarity. The matches identify what kinds of failure are likely against this request.
  • Defence generationFor each matched pattern, a defence is generated — an (injection, filter) pair tailored to the specific request. The injection becomes a directive added to the prompt context. The filter becomes a pattern-matcher applied to the model's response.
  • Defence executionThe injection-augmented prompt goes to the model. The response runs through the filters. If any filter triggers, the intervention loop kicks in — model re-prompted with stronger guidance, response regenerates, cycle continues until filters are clean or an iteration limit is reached.

Pre-loading from multiple corpora

The contextual database is bootstrapped from corpora that contain codified knowledge about how AI fails and what defends against it.

Phil's six-month corpus. Code-related Claude logs and phone conversations spanning six-plus months. Engineer-grade user getting unusually high value from AI through deliberate technique. Patterns extracted are patterns that produce successful interaction for someone who knows what they're doing — precisely the discipline Wingman gives to users who don't.

The public prompt-engineering corpus. Articles, papers, courses, guides, blog posts, conference talks. Tens of thousands of practitioners contributing observations about what fails and what works. Structurally biased toward documenting failures.

AI safety and red-teaming research. Academic and industry work on hallucination, sycophancy, jailbreaking, prompt injection, alignment failures. Substantial body of structured knowledge about how AI fails.

Anthropic's published material on Claude. Documentation, model cards, behaviour guides. Anthropic-specific knowledge about how their model fails and what works.

Runtime learning

Pre-loading produces breadth. Runtime learning produces depth specific to how this organisation actually uses AI.

The frustration signal. When a user submits a request intentionally similar to a previous request but with more words, this is the rephrasing-as-dissatisfaction signal. The user wasn't satisfied with the previous response; they're trying again with more specification.

This signal is automatic (no user action required), naturally weighted toward important failures (the user only rephrases when the failure mattered enough), robust against sycophancy bias, and self-correcting (as the system improves at handling a request type, the rephrasing rate drops).

The learning pipeline. When frustration triggers, the failed exchange is captured: original request encoding, response that triggered frustration, rephrased request. Failure mode classified. Pattern extracted. Defence generated. New (pattern, example defence) pair added to the contextual database. Future requests with similar intention get the defence applied.

Why this is better than authored skills

Static skill triples either apply to a request or don't. Generated defences can be tuned to the specific intent of the current request. The pattern catalogue can be much larger than a hand-authored skill catalogue — patterns are smaller, more specific, easier to extract from corpora. Composition across patterns becomes natural — multiple injections in the prompt, multiple filters on the response.

Status

Architecture commitments stable: pattern matching plus runtime defence generation; multiple corpus pre-loading; runtime learning loop; three-domain pattern layering (universal, domain, organisation). Specific implementation details (encoding model, defence generator model, iteration limits, disambiguation logic) in formation.

Wingman is post-Guardian — depends on Guardian being mature enough to host Wingman's runtime.

Want to discuss the architecture?

[email protected]