A statistical measurement architecture for runtime governance of AI agent deployments. It treats agent behaviour as a measurable quantity, not a self-reported claim. The core statistical engine is TED -- the Transform Engine for Decision Evidence.
We do not ask the nuclear reactor if it is safe. We measure the neutron flux with external sensors.
Current AI risk management asks the agent to grade its own homework. The failure mode of the safety mechanism is correlated with the failure mode of the agent. CARF breaks that correlation by measuring behaviour externally, using behavioural telemetry and distribution-free statistical methods under stated assumptions.
In deployed black-box systems, the safety-relevant property is often only partially accessible from internal state alone. In those settings, rigorous empirical measurement is not a fallback. It is the correct interface for operational control.
When the property of interest cannot be directly inspected, calibrated measurement over observables becomes the safety contract.
The Agentic Manifold of Behavioural Attestation.
AMBA observes agent behaviour through external telemetry and produces structured, replayable evidence. It does not make governance decisions -- it measures.
AMBA ingests raw multi-turn interactions and extracts behavioural state. It normalises per-turn sensor vectors across four families -- structural, semantic, interaction, and control -- distinguishing structural zeros (behaviour absent) from missing data (sensor failure). The mask ensures silence is never confused with compliance.
AMBA discovers fragility space from behavioural phenotype vectors and maintains a signature registry. It freezes attack sequences into reproducible stimuli -- Standard Candles -- building and curating the adversarial library used for calibration and stress testing. Every missed attack becomes a new candle. A trick only works once.
Agent behaviour is not random. It lives on a low-dimensional manifold embedded in high-dimensional measurement space. In-envelope and out-of-envelope are not separate clusters. They are topological regions connected by traversable trajectories.
Observational epidemiology for AI. We instrument the vital signs to reconstruct the geometry of risk.
Between AMBA's raw observations and CYRL's governance decisions sits TED -- the core statistical computation layer.
Coverage-guaranteed operating boundaries under exchangeability. Distribution-free. Finite-sample false alarm control.
Accumulates evidence for drift without fixed stopping points. Anytime-valid sequential evidence control.
Extracts behavioural signatures -- stable patterns tested under resampling and cross-dataset validation.
Converts sensor readings into fixed-length nonnegative vectors preserving behavioural distribution texture.
TED is implemented in cyrl_core, an internal Python package with 482 passing tests and no external statistical dependencies. All methods built from first principles.
The Conformal Yaw Recognition Layer.
If the statistical basis for assurance breaks, CYRL moves the system into a conservative posture -- review, escalation, constraint, or stop depending on deployment policy.
CYRL calibrates conformal prediction boundaries under stated assumptions -- Interior, Boundary, Exterior -- with finite-sample false alarm control.
CYRL runs the validity state machine: Commissioning, Valid, Suspect, Invalid. It accumulates e-value sequential evidence for drift detection and does not release the permission slip unless coverage is defensible.
CYRL converts validated risk signals into actions: thinking budget adjustment, tool restrictions, escalation to human review. Bidirectional control -- structural risk gets more thinking budget, rationalisation risk gets less. Real-time intervention is only enabled when the Tiger Battery confirms a stable, actionable dose-response curve. If the curve is chaotic, intervention is disabled.
Under exchangeability, the probability of false alarm is bounded by alpha. Finite-sample. Distribution-free.
Sequential monitoring via e-values detects slow drift. The alarm triggers when the product of e-values exceeds the threshold. Anytime-valid -- check at any time without inflating false alarm rates.
This is not a vibes check. It is a finite-sample statistical guarantee under stated assumptions.
Agentic Risk-Informed Consent and Calibration.
The compliance and evidence layer. ARIC2 defines five stable interfaces -- join keys -- that regulators can write requirements against without understanding the underlying mathematics.
What must be logged
How deviation is scored
What must be demonstrated before production
What happens when assumptions break
What must be retained for audit
The regulator does not need to understand graph-regularised NMF. They need to know the system has a calibrated alarm, knows when it is valid, and produces audit evidence.
You do not deploy this. You commission it. Like a pressure vessel.
Identify the workflow, operating assumptions, and the boundary within which monitoring is intended to hold.
Gather runtime traces and behavioural records from existing systems.
Execute the Tiger Battery and calibration workloads against the collected baseline.
Build the measurement pipeline: sensor normalisation, signature extraction, state machine configuration.
Set conformal prediction boundaries with finite-sample guarantees under stated assumptions.
Confirm that calibration contract holds on held-out data before production.
Activate the validity state machine and begin live monitoring.
Any change to the stratum voids the calibration and forces re-commissioning.
Real-time. Validity-gated guarantees under stated assumptions. The question: are we in bounds right now?
Offline stress-testing. No guarantees -- objective is discovery. Runs the Tiger Battery. The question: where are the bounds?
Standard Candles are frozen attack sequences -- reproducible stimuli applied at varying intensity. The dose-response curve is not assumed monotone. The shape is empirically classified: monotone, threshold, non-monotone, hysteresis, flat.
The shape determines the intervention policy.
If the complex geometry fails, fall back. If the manifold hypothesis fails, drop the geometry module. The system degrades to a calibrated anomaly detector. Simpler maths, not silence.
Mathematical proof under stated assumptions.
True by construction.
Tested at Gate 0.
Gated on dose-response discovery.
We sell calibrated uncertainty, not false confidence.
Each module has a named operator with a defined role and verb.
Authority flows upward. The dog with the most power needs the most proof.
Warm Dogs. Cold Machines. Hard Math.
Warm Dogs. Cold Machines. Hard Math.
Discuss a pilot