RFD 019 idea

Guarantee Processor Design

AUTHOR FlexHEG CREATED 2024-11-27

verificationhardware

The Idea

A Guarantee Processor is a dedicated, auditable compute component that monitors accelerator usage and verifies compliance with specified rules. Unlike general-purpose CPUs or GPUs (which prioritize performance over auditability), the Guarantee Processor is designed to be simple enough for third-party audit, secure against physical and cyber attacks, and flexible enough to execute updatable guarantee logic.

The processor needs: general-purpose compute capability, access to accelerator workload data (memory, instructions, network traffic, performance counters, power), accurate timekeeping, cryptographic hardware (ideally quantum-resistant), secure boot with rollback protection, and the ability to sign attestation certificates. It should be performant enough to verify a fraction of accelerator computation—roughly 1/192 the power of a Blackwell chip (one Streaming Multiprocessor equivalent).

Why It Matters

Guarantees about AI workloads are only credible if the component checking them can be trusted. General-purpose hardware has too many conflicting priorities; a dedicated Guarantee Processor can be open-sourced, formally verified, and physically audited without revealing trade secrets. This is the foundation for any hardware-enabled governance regime.

Implementation Options

Location	Pros	Cons
On-chip (accelerator die)	Hardest to tamper, direct access to signals	Requires manufacturer cooperation, multi-year lead time
HBM-PIM/PNM	Direct memory access, transferable across accelerator types	Doesn’t exist yet, requires HBM vendor cooperation
NIC	Retrofittable, sits on network data path, DMA to HBM	PCIe reads could be spoofed, requires NIC vendor cooperation
DC-SCM/BMC	Easiest to replace, already has power monitoring	Not on data path, easier to spoof readings
Dedicated chiplet	Can be added to existing boards via PCIe	Supply chain complexity, physical access concerns

Design Requirements

FPGA or general-purpose processor for flexible guarantee logic
Hardware AES-GCM (ideally 800Gb/s+ for NIC-based designs)
Secure random number generator
Glitch/fault injection protection
Non-volatile memory for months of local logs
Lockstep execution capability (run same instructions on 2-3 processors, compare outputs)

Trust Mechanisms

Open-source design: HDL can be audited and compared against physical scans of random chips
FPGA-based logic: Configurations publicly auditable, harder to hide trojans in uniform structure
Multi-vendor lockstep: Source processors from different manufacturers, use 2/3 voting
FIPS-140-3 style certification: Formal security evaluation process

Open Questions

Can existing RISC-V cores on NVIDIA GPUs (used for GPU System Processor) be repurposed?
What’s the minimum viable processor for useful guarantees?
How to handle guarantee logic updates without introducing vulnerabilities?

References

OpenTitan — open-source silicon root of trust
Caliptra — datacenter SoC root of trust
NVIDIA GPU System Processor
zeroRISC post-quantum secure boot