ClearMask: Noise-free and Naturalness-Preserving Speech Protection Against Voice Deepfake Attacks

Deepfake voice attacks emerge as a severe threat by artificially impersonating human speech for malicious purposes. Existing defenses often rely on injecting noise to disrupt voice encoders, degrading audio quality and requiring prior knowledge of the voice generation model, which limits transferability. Additionally, traditional methods fail to provide fast protection in real-time applications like online meetings or instant messaging. To overcome the weaknesses, we propose ClearMask, a novel defense method against deepfake voice attacks. We display some speech samples to illustrate the effectiveness of ClearMask.


ClearMask Speech Samples

Raw Speech

Synthesized Speech

Synthesized Speech
(Protected)

Text Input

"Hi, I'm calling from HR. We're updating our records and need your Social Security number."
"This is the medical office. We require the patient's file to be sent to a new email immediately."
"Finance department here. We need the wire transfer code for the recent transaction to finalize it."
"This is tech support. We've detected unusual activity on your account. Please provide your password for verification."
"I'm outside the building without my ID. Can you buzz me in or give the access code?"
"This is from IT support. We've noticed a security breach. Please provide your username and temporary password immediately."
"Hi, we're updating the security protocol. Can you confirm your employee ID and access badge number for verification?"
"Hello, I'm handling a critical project update. Can you email me the latest financial forecast document right now?"
"This is the service desk. To restore your account access, we need you to confirm your mother's maiden name and birthdate."
"Calling from customer service. To prevent your account from being locked, please provide the recent one-time password sent to you."