This is an automated archive made by the Lemmit Bot.

The original was posted on /r/machinelearning by /u/Successful-Western27 on 2024-11-14 13:19:01+00:00.


I found an important analysis of backdoor attacks that demonstrates how a malicious service provider can insert undetectable backdoors into machine learning models.

The key contribution is showing how to construct backdoors that are provably undetectable even under white-box analysis, while allowing arbitrary manipulation of model outputs through subtle input perturbations.

Technical details: * Two frameworks for planting undetectable backdoors: * Digital signature scheme-based backdoors that are computationally infeasible to detect with black-box access * Random Fourier Features/Random ReLU based backdoors that withstand white-box inspection * Backdoored models are indistinguishable from clean models even with: * Full access to model architecture and parameters * Complete training dataset * Ability to analyze model behavior

Results: * Backdoored models maintain same generalization error as original models * Service provider can modify classification of any input with slight perturbations * Construction works with any underlying model architecture * Backdoors cannot be detected by any computationally-bounded observer

The implications are significant for ML security and outsourced training. The work shows fundamental limitations in certifying adversarial robustness - a backdoored model can be indistinguishable from a robust one while having adversarial examples for every input.

TLDR: Paper proves it’s possible to insert undetectable backdoors into ML models that allow arbitrary manipulation of outputs while being provably impossible to detect.

Full summary is here. Paper here.