Compute Graph Declaration Protocol
The Idea
To verify that declared workloads match actual workloads, developers make claims about the operations their code performs, which the Guarantee Processor verifies. These claims form a “compute graph”—a DAG of atomic operations (matrix multiplications, data transfers) on memory blocks.
Two modes:
- Static declaration: Complete workload description upfront, verify execution matches
- Dynamic declaration: Declare each operation as it executes, build graph incrementally
Dynamic declaration handles data-dependent control flow (like Mixture-of-Experts routing) that can’t be described statically.
Why It Matters
The gap between “what code was loaded” (attestation) and “what computation actually happened” is where violations hide. A verified compute graph closes this gap: you know not just that TensorFlow was running, but that it executed specific operations on specific data, producing specific outputs.
Atomic Operations
Accelerator workloads are composed of a limited set of operations:
| Category | Examples |
|---|---|
| Tensor math | MatMul, Conv, Attention, Softmax |
| Element-wise | Add, ReLU, GELU, LayerNorm |
| Data movement | Copy, Transpose, Gather, Scatter |
| Reduction | Sum, Mean, Max |
| Communication | AllReduce, AllGather, Send, Recv |
Each operation: (input_blocks, operation_type, parameters) → output_blocks
Verification Methods
| Data Source | Verification Approach |
|---|---|
| Network I/O | Direct observation by Interlock |
| Local computation | Random recomputation of output samples via DMA |
| Memory state | Periodic snapshots, hash and store |
| Kernel execution | Parse kernel code (if accelerator trusted to execute as written) |
Scaling Challenge
Per-accelerator logging is ~10KB/s (10 kernels/sec × 1KB/kernel). But for a 100K accelerator cluster, the global compute graph grows at 1GB/s—exceeds SSD write speeds and fills RAM instantly.
Solution: Distributed analysis. Each Guarantee Processor stores local graph + compact summaries of causally connected graphs. Global checks happen periodically (e.g., at checkpointing) rather than continuously.
Static vs Dynamic
| Mode | Pros | Cons |
|---|---|---|
| Static | Full workload analysis before execution, no runtime overhead | Can’t handle data-dependent control flow (MoE, early exit) |
| Dynamic | Handles any control flow, logs what actually happened | Higher runtime overhead, more complex verification |
Existing Frameworks
Could build on:
- PyTorch FX / Dynamo (computation graphs)
- JAX (functional transformations)
- StableHLO (portable ML operations)
- ONNX (interchange format)
- CUDA Graphs (GPU execution graphs)
Open Questions
- How to handle non-determinism (sampling, dropout, async execution)?
- What’s the overhead of dynamic declaration?
- Can attackers construct “compliant-looking” graphs that hide violations?
- How to verify operations without trusting accelerator internals?
- Is Rice’s Theorem a practical limitation for static analysis of guarantees?