RFD 018 discussing

Granular AI Workload Verification

AUTHOR Romeo Dean CREATED 2024-11-27

verificationhardwareloggingnetwork

The Idea

Verifying what workloads actually run on an AI cluster is hard: GPT-4 used 10²⁵ FLOPs, frontier runs in 2030 will be 1000× larger, computation happens at extreme frequency on tiny transistors, is irreversible (information destroyed on completion), and often nondeterministic. An idealized setup would record every operation—but this is intractable without hardware overhaul and unacceptable performance penalties.

Instead, we propose “auditor redlines”: physical boundaries around compute clusters where no information crosses without being sent to auditors. This is enforced via physical inline capture on all network connections crossing the redline, plus airgapping to minimize side-channels. The verification problem then reduces to: is the information crossing the redline sufficient to verify workloads follow the rules?

Why It Matters

This enables a “training-capped regime” where AI companies can fully utilize existing hardware in third-party-monitored training hubs, while auditors have maximal information without overwhelming performance penalties. The key insight: you don’t need to log every FLOP—you need to log enough packets (batches) that statistical sampling catches violations with high probability.

Protocol Overview

┌─────────────────────────────────────────────────────────────┐
│                    AUDITOR REDLINE                          │
│  ┌─────────────────────────────────────────────────────┐   │
│  │              AI COMPANY CLUSTER                      │   │
│  │                                                      │   │
│  │   • No non-volatile memory                          │   │
│  │   • Forced determinism                              │   │
│  │   • Periodic memory wiping                          │   │
│  │   • Compilation happens OUTSIDE redline             │   │
│  │                                                      │   │
│  └──────────────────────┬───────────────────────────────┘   │
│                         │                                    │
│              ┌──────────┴──────────┐                        │
│              │  Physical Inline    │                        │
│              │  Capture (all I/O)  │                        │
│              └──────────┬──────────┘                        │
└─────────────────────────┼───────────────────────────────────┘
                          │
                    ┌─────┴─────┐
                    │  AUDITOR  │
                    │    HUB    │
                    └───────────┘

Verification Lifecycle

Phase	Actor	Action
Declaration	Company	Declares workload with full transparency (code, algorithms, data). Auditors physically can’t leak it.
Approval	Auditor	Runs Workload Approval checks. If pass → sends approval + random seeds (prevents gaming fake randomness).
Logging	Auditor	Observes complete data flow through redline via inline capture.
During-execution	Auditor	Random sampling, proof-of-work checks, periodic capability evals to ensure cap isn’t crossed.
Submission	Company	Submits final model weights when ready to release.
Final auditing	Auditor	Final capability evals, control/safety red-teaming.
Release	Auditor	Approved weights returned to company, added to whitelist for inference deployment.

Defense Layers Inside the Redline

Layer	Purpose
Forced determinism	Minimize nondeterminism so expected outputs have tiny error margin
No non-volatile memory	Prevents persistent rogue state
Periodic memory wiping	Proof-of-memory challenges, forced state reset
External compilation	Binary/assembly compiled outside redline reduces attack surface for instruction ambiguity

Attack Taxonomy & Mitigations

A successful violation requires: (1) undeclared workload runs, and (2) lasting remnant benefits attacker. If rogue computation is reliably deleted, no successful violation.

Attack Vector 3a: Hide rogue outputs in noise

Attacker hides rogue computation results in outputs, exploiting nondeterminism margin.

Mitigation: Force determinism so expected output set is tiny. Proof-of-work recomputation on random samples catches violations with high probability.

Attack Vector 3b: Covert channels

Smuggle results via side-channels (EM, power, timing, RF).

Mitigation: Airgap around redline for side-channel signals. Pad inputs/outputs at inline capture points to mask timing channels.

Attack Vector 3c: Auditor collusion

Smuggle results by bribing/colluding with auditors.

Mitigation: Limit auditor-developer communication. Multiple independent parties with opposing incentives observe all communications.

Attack Vector: Supply chain

Hardcoded backdoors in hardware inside redline that survive reboot.

Mitigation: Hardware audits, trusted supply chain, memory wiping. (Hardest to fully mitigate.)

Proof-of-Work Recomputation Analysis

Packet Abstraction

An AI workload = series of packets: (inputs, instructions, outputs)

Logging Regime	Packets Observed
Idealized (every FLOP)	N_packets = FLOPs
Our setup (server-rack level)	N_packets = N_batches

For distributed data parallel (DDP) training with synchronous updates and checkpointing:

Each mini-batch gradient update = 1 packet
Packet = (starting weights × input data, training algorithm + auditor seeds, resulting weights)
N_packets = N_batches = DATA_SIZE / BATCH_SIZE

Detection Probability

Attacker fakes N_FAKE packets out of N_PACKETS total. Auditor verifies N_VERIFIED packets via recomputation.

P(verified packet catches fake) = N_FAKE / N_PACKETS

E[detected fakes] = N_VERIFIED × (N_FAKE / N_PACKETS)

N_DETECTED ~ Poisson(N_VERIFIED × F_FAKE)

P(successful violation) = P(N_DETECTED = 0) = e^(-N_VERIFIED × F_FAKE)

Where F_FAKE = N_FAKE / N_PACKETS (fraction of packets that are fake).

Required Verification for 99.9% Confidence

P(success) < 0.001
e^(-N_VERIFIED × F_FAKE) < 0.001
N_VERIFIED × F_FAKE > -ln(0.001) ≈ 7

Key insight: Attacks touching fewer packets require more verification to catch.

Worst-Case Attack: Backdoor via Training Data

Scariest attack: insert backdoor with minimal fake data (e.g., sandbagging on evals, secret loyalties).

Parameter	Value
Backdoor data needed	~100K tokens
Typical frontier training data	100T tokens
F_FAKE	10⁻⁹ (0.0000001%)
N_VERIFIED needed (99.9% confidence)	7 × 10⁹ packets

Computational Cost of Verification

COST_FRAC = (BATCH_SIZE × N_VERIFIED) / N_TOKENS
         = 7 × BATCH_SIZE × F_FAKE / N_TOKENS

Batch Size	Verified Packets	Tokens Recomputed	Cost as % of Training
1000	7B	7T / 100T	7%
100	7B	700B / 100T	0.7%

Takeaway: Smaller batch sizes → more packets → cheaper verification.

Reducing Packet Size

Two approaches to get more packets (better verification granularity):

Smaller batches: More gradient updates, more packets at network level
Lower-level logging: Track intermediate computations within a batch (e.g., individual matmuls). Requires inline capture within server racks or even within GPUs (between RAM and L2 cache).

The ideal limit: bit-level packets (every logic gate operation).

Open Questions

Hardware backdoors: How to verify hardware inside redline doesn’t have hardcoded backdoors that survive memory wipes?
Nondeterminism residual: How much output variance remains after “forced determinism”? Is it small enough for practical verification?
Memory wiping protocol: How often? How to verify completeness? Performance impact?
Compilation attack surface: What instruction-level ambiguities remain even with external compilation?
Intermediate logging: Is within-rack or within-GPU inline capture technically feasible? Cost?

Relationship to Other RFDs

RFD	Relationship
RFD 017: Inference-Only Verification	Complementary. RFD 017 = “no training allowed”; RFD 018 = “training allowed but verified”
RFD 005: Network Tap	RFD 005 is a component of the “physical inline capture” in this proposal
RFD 007: Mutually Trusted Cluster	Could be used as the recomputation infrastructure for proof-of-work
RFD 013: Network Exhaustion	Related chokepoint strategy