Non-parametric Bayesian Learning with Incomplete Data
In most machine learning approaches, it is usually assumed that data are complete. When data are partially missing due to various reasons, for example, the failure of a subset of sensors, image corruption or inadequate medical measurements, many learning methods designed for complete data cannot be directly applied. In this dissertation we treat two kinds of problems with incomplete data using non-parametric Bayesian approaches: classification with incomplete features and analysis of low-rank matrices with missing entries.
Incomplete data in classification problems are handled by assuming input features to be generated from a mixture-of-experts model, with each individual expert (classifier) defined by a local Gaussian in feature space. With a linear classifier associated with each Gaussian component, nonlinear classification boundaries are achievable without the introduction of kernels. Within the proposed model, the number of components is theoretically ``infinite'' as defined by a Dirichlet process construction, with the actual number of mixture components (experts) needed inferred based upon the data under test. With a higher-level DP we further extend the classifier for analysis of multiple related tasks (multi-task learning), where model components may be shared across tasks. Available data could be augmented by this way of information transfer even when tasks are only similar in some local regions of feature space, which is particularly critical for cases with scarce incomplete training samples from each task. The proposed algorithms are implemented using efficient variational Bayesian inference and robust performance is demonstrated on synthetic data, benchmark data sets, and real data with natural missing values.
Another scenario of interest is to complete a data matrix with entries missing. The recovery of missing matrix entries is not possible without additional assumptions on the matrix under test, and here we employ the common assumption that the matrix is low-rank. Unlike methods with a preset fixed rank, we propose a non-parametric Bayesian alternative based on the singular value decomposition (SVD), where missing entries are handled naturally, and the number of underlying factors is imposed to be small and inferred in the light of observed entries. Although we assume missing at random, the proposed model is generalized to incorporate auxiliary information including missingness features. We also make a first attempt in the matrix-completion community to acquire new entries actively. By introducing a probit link function, we are able to handle counting matrices with the decomposed low-rank matrices latent. The basic model and its extensions are validated on
synthetic data, a movie-rating benchmark and a new data set presented for the first time.
This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 United States License.
Rights for Collection: Duke Dissertations