Multimodal Probabilistic Inference for Robust Uncertainty Quantification
Deep learning models, which form the backbone of modern ML systems, generalize poorly to small changes to the data distribution. They are also bad at signalling failure, making predictions with high confidence when their training data or fragile assumptions make them unlikely to make reasonable decisions. This lack of robustness makes it difficult to trust their use in safety-critical settings. Accordingly, there is a pressing need to equip models with a notion of uncertainty to understand their failure modes and detect when their decisions cannot be used or require intervention. Uncertainty quantification is thus crucial for ML systems to work consistently on real-world data and fail loudly when they don’t.One growing line of research on uncertainty quantification is probabilistic modelling which is concerned with capturing model uncertainty by placing a distribution over the models which can be marginalized at test-time. This is especially useful in underspecified models which can have diverse near-optimal solutions, at training time, with similar population-level performance. However, probabilistic modelling approaches such as Bayesian neural networks (BNN) do not scale well in terms of memory and runtime and often underperform simple deterministic baselines in terms of accuracy. Furthermore, BNNs underperform deep ensembles as they fail to explore multiple modes, in the loss space, while being effective at capturing uncertainty within a single mode.
In this thesis, we develop multimodal representations of model uncertainty that can capture a diverse set of hypotheses. We first propose a scalable family of BNN priors (and corresponding approximate posteriors) that combine the local (i.e. within-mode) uncertainty with mode averaging to deliver robust and calibrated uncertainty estimates in addition to improving accuracy both in and out of distribution. We then leverage a multimodal representation of uncertainty to modulate the amount of information transfer between tasks in meta-learning. Our proposed framework integrates Bayesian non-parametric mixtures with deep learning to enable NNs to adapt their capacity as more data is observed which is crucial for lifelong learning. Finally, we propose to replace the reverse Kullback-Leibler divergence (RKL), known for its mode-seeking behavior and for underestimating posterior covariance, with the forward KL (FKL) divergence in a theoretically-guided novel inference procedure. This ensures the efficient combination of variational boosting with adaptive importance sampling. The proposed algorithm offers a well-defined compute-accuracy trade-off and is guaranteed to converge to the optimal multimodal variational solution as well as the optimal importance sampling proposal distribution.
bayesian deep learning
probabilistic machine learning
This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 United States License.
Rights for Collection: Duke Dissertations
Works are deposited here by their authors, and represent their research and opinions, not that of Duke University. Some materials and descriptions may include offensive content. More info