Advancing the Design and Utility of Adversarial Machine Learning Methods
While significant progress has been made to craft Deep Neural Networks (DNNs) with super-human recognition performance, their reliability and robustness in challenging operating conditions is still a major concern. In this work, we study multiple facets of the DNN robustness problem by pursuing two main threads of research. The key methodological linkage throughout our investigations is the consistent design/development/utilization/deployment of Adversarial Machine Learning techniques, which have remarkable abilities to both degrade and enhance model performance. Our ultimate goal is to help construct the more safe and reliable models of the future.
In the first thread of research, we take the perspective of an adversary who wishes to find novel and increasingly potent ways to fool current DNN models. Our approach is centered around the development of a feature space attack, and the construction of novel adversarial threat models that work to reduce required knowledge assumptions. Interestingly, we find that a transfer-based blackbox adversary can be significantly more powerful than previously believed, and can reliably cause targeted misclassifications with imperceptible noises. Further, we find that the attacker does not necessarily require access to the target model's training distribution to create transferable attacks, which is a more practically concerning scenario due to the reduction of required attacker knowledge.
Along the second thread of research, we take the perspective of a DNN model designer whose job is to create systems capable of robust operation in ``open-world'' environments, where both known and unknown target types may be encountered. Our approach is to establish a classifier + out-of-distribution (OOD) detector system co-design that is centered around an adversarial training procedure and an outlier exposure-based learning objective. Through various experiments, we find that our systems can achieve high accuracy in extended operating conditions, while reliably detecting and rejecting fine-grained OOD target types. We also develop a method for efficiently improving OOD detection by learning from the deployment environment. Overall, by exposing novel vulnerabilities of current DNNs while also improving the reliability of existing models to known vulnerabilities, our work makes significant progress towards creating the next-generation of more trustworthy models.
Automatic Target Recognition
Machine Learning Security
This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 United States License.
Rights for Collection: Duke Dissertations
Works are deposited here by their authors, and represent their research and opinions, not that of Duke University. Some materials and descriptions may include offensive content. More info