Stochastic Latent Domain Approaches to the Recovery and Prediction of High Dimensional Missing Data

Loading...

Date

2023

Journal Title

Journal ISSN

Volume Title

Repository Usage Stats

22
views
60
downloads

Abstract

This work presents novel techniques for approaching missing data using generative models. The main focus of these techniques is on leveraging the latent spaces of generative models, both to improve inference performance and to overcome many of the architectural challenges missing data poses for current generative models. This work includes methodologies that are broadly applicable regardless of model architecture and model specific techniques.

The first half of this work is dedicated to model agnostic techniques. Here, we present our Linearized-Marginal Restricted Boltzmann Machine (LM-RBM), a method for directly approximating the conditional and marginal distributions of RBMs used to infer missing data. We also present our Semi-Empirical Ab Initio objective functions for Markov Chain Monte Carlo (MCMC) proposal optimization, which are objective functions of a restricted functional class that are fit to recover analytically known optimal proposals. These Semi-Empirical Ab Initio objective functions are shown to avoid failures exhibited by current objective functions for MCMC propsal optimization with highly expressive neural proposals and enable the more confident optimization of deep generative architectures for MCMC techniques.

The second half of this work is dedicated to techniques applicable to specific generative architectures. We present Projected-Latent Markov Chain Monte Carlo (PL-MCMC), a technique for performing asymptotically exact conditional inference of missing data using normalizing flows. We evaluate the performance of PL-MCMC based on its applicability to tasks of training from and inferring missing data. We also present our Perceiver Attentional Copula for Time Series (PrACTiS), which utilizes attention with learned latent vectors to significantly improve the computational efficiency of attention based modeling in light of the additional challenges that time series data pose with respect to missing data inference.

Description

Provenance

Subjects

Artificial intelligence, Statistics, Generative models, Markov chain Monte Carlo, Missing data

Citation

Citation

Cannella, Christopher Brian (2023). Stochastic Latent Domain Approaches to the Recovery and Prediction of High Dimensional Missing Data. Dissertation, Duke University. Retrieved from https://hdl.handle.net/10161/27715.

Collections


Except where otherwise noted, student scholarship that was shared on DukeSpace after 2009 is made available to the public under a Creative Commons Attribution / Non-commercial / No derivatives (CC-BY-NC-ND) license. All rights in student work shared on DukeSpace before 2009 remain with the author and/or their designee, whose permission may be required for reuse.