RFD 009 idea

Zero-Maintenance Compute for Tamper Evidence

AUTHOR James Petrie CREATED 2024-11-26

verificationhardware

The Idea

Lock compute in a tamper-evident enclosure and never open it. This requires clusters that can operate without physical maintenance—using overprovisioning, redundant components, conservative thermal/power settings, and fault-tolerant distributed algorithms.

If you can credibly commit to “this enclosure hasn’t been opened since installation,” many tampering attacks become impossible. The question is whether current hardware reliability makes this practical, and what the cost premium is for sufficient redundancy.

Why It Matters

Physical access is the root of most hardware attacks. Eliminating maintenance access eliminates a large class of tampering opportunities. Even if zero-maintenance isn’t fully achievable, understanding the reliability frontier helps design minimal-access protocols.

Open Questions

What’s the expected MTBF for a redundant GPU cluster?
How much overprovisioning is needed for multi-year operation?
Can fault-tolerant AllReduce (arxiv:2510.20171) make training robust to node failures?

The Idea

Why It Matters

Open Questions

References