A patient's name in a clinical note. Their voice in an ASR transcript. A waveform header with their DOB. Three separate records, each one low-risk by itself. Together, they're enough to re-identify someone.
Per-record de-identification doesn't see this. It can't. It has no memory of what came before.
We built a system that does. AMPHI tracks PHI exposure across modalities and time, maintains a risk score per patient, and escalates masking automatically as exposure accumulates. When a text record and an audio record co-reference the same patient via embedding similarity, the system catches it and responds before the next record arrives.
The core results: adaptive policy holds privacy at 0.991 on high-risk bursty workloads while keeping utility at 0.847. Static redaction gets the privacy number but destroys utility. Static weak masking keeps utility but leaks on high-risk bursts. The adaptive system doesn't trade one for the other.
Full system is open-source. Five models, three datasets, two demo spaces, 141 passing tests.