← ALL RFDS
RFD 020 idea

Tamper-Evident Secure Enclosure for Accelerators

AUTHOR FlexHEG CREATED 2024-11-27
verificationhardwarephysical-security

The Idea

A secure enclosure protects flexHEG components from physical tampering. The boundary can be at chip-level, board-level, tray-level, or rack-level, with different tradeoffs. For verification use cases, tamper-evidence (detecting that tampering occurred) is sufficient. For enforcement use cases (guarantees about future usage), tamper-response (triggering accelerator disablement before circumvention) is needed.

The core challenge: AI accelerators dissipate up to 1200W per chip and require liquid cooling. The enclosure must allow heat transfer while preventing physical access to signals, and must operate reliably for years without maintenance access.

Why It Matters

Without physical security, all software/firmware guarantees can be bypassed by an attacker with screwdrivers and an oscilloscope. Secure enclosures are mature technology for cryptographic coprocessors (FIPS-140-4 certified devices exist), but haven’t been adapted for high-power AI accelerators. Solving the thermal/form-factor challenge enables hardware-backed verification.

Enclosure Types

TypeMechanismUse Case
Tamper-evidentSeals, cameras, visual inspectionVerification with periodic inspection
PUF-basedPhysically Unclonable Function derives keys from enclosure properties; tampering corrupts keyPassive (no battery), but key regeneration is slow
Tamper-responsiveActive sensors trigger disablement on breachEnforcement of future guarantees

Sensing Methods

  • Capacitance measurement: Serpentine conductor pattern, detect changes from probing
  • Resistance mesh: Conductor grid, detect cuts or shorts
  • RF response: Measure radio characteristics inside enclosure, detect metallic intrusions
  • Environmental: Radiation, voltage glitching, laser, temperature, temperature rate-of-change

Tamper Response Options

MethodProsCons
Capacitor dischargeSimple, retrofittable, destroys chip with voltage surgeRequires charged capacitors
Antifuse bitsMillions of OTP cells throughout chip, triggered activation renders chip inoperableRequires deep integration at design time
NanothermiteRapid exothermic reaction physically destroys componentsExperimental, safety concerns

Cooling Compatibility

Two approaches for liquid-cooled systems:

  1. Sandwich: Enclosure boundary between chip and cold plate (must be very thin, very thermally conductive)
  2. Contain cooling: Liquid tubing inside enclosure, heat exchange at boundary via liquid-liquid exchanger

For air-cooled: metalized foam allowing airflow but no line-of-sight, geometry sensed by RF.

Scale Considerations

Enclosure SizeProsCons
Chip-levelSmall attack surface, integrated early in supply chainRequires advanced tooling to design
Board-levelMatches existing form factorsMore surface area for flaws
Rack-levelEasiest to deploySingle breach exposes many GPUs, maintenance problematic

Open Questions

  • Can existing FIPS-140-4 enclosure technology be adapted for 1200W thermal loads?
  • What’s the marginal per-device cost of circumvention for nation-state attackers?
  • How to handle maintenance if enclosure cannot be opened?
  • Can two-layer enclosures (one from each party) reduce bilateral trust requirements?

References