Toward Trustworthy Machine Learning with Blackbox and Whitebox Methods

Loading...
Thumbnail Image

Date

2023

Advisors

Journal Title

Journal ISSN

Volume Title

Repository Usage Stats

38
views
110
downloads

Abstract

With the growing applications of machine learning (ML) in high-stake areas such as autonomous driving, medical assistance, and financial prediction, building trustworthy ML models with reliable performance in novel situations becomes increasingly important. While most existing ML methods achieve good averaged performance on standard test data, their worst-case performance on adversarial or out-of-distribution data, both common in real-world scenes, can be arbitrarily bad. This dissertation discusses blackbox and whitebox methods, as short-term and long-term solutions respectively, to the trustworthy issue.The blackbox methods consider immediate remedies to existing ML systems, treat such systems as black boxes, and aim to wrap them with an extra layer of protection against common adversaries. Two specific attack settings are discussed, where attackers either modify images with small stickers, or poison a small portion of training data to inject backdoors. The proposed solutions include a neural-guided sticker reverse engineering technique and an ensemble training method based on a novel backdoor detection code. While being universal to all types of ML systems, the blackbox methods also require strong assumptions of the attack. Next, the whitebox methods explore new families of ML models that mimic human's reasoning capability, generalize to open domains, and are trustworthy by design. A novel neural representation of probabilistic programs extends existing neural networks to capture complex probabilistic knowledge of the world and perform inference. A reinforcement learning-inspired inference algorithm addresses the efficiency issue in a single input-output setting. Although still difficult to handle real-world high-dimension signals, the initial results demonstrate the potential of such methods as a long-term solution to fundamentally address the challenging trustworthy problem.

Description

Provenance

Citation

Citation

Qiao, Ximing (2023). Toward Trustworthy Machine Learning with Blackbox and Whitebox Methods. Dissertation, Duke University. Retrieved from https://hdl.handle.net/10161/27675.

Collections


Except where otherwise noted, student scholarship that was shared on DukeSpace after 2009 is made available to the public under a Creative Commons Attribution / Non-commercial / No derivatives (CC-BY-NC-ND) license. All rights in student work shared on DukeSpace before 2009 remain with the author and/or their designee, whose permission may be required for reuse.