Modeling Temporal and Spatial Data Dependence with Bayesian Nonparametrics
In this thesis, temporal and spatial dependence are considered within nonparametric priors to help infer patterns, clusters or segments in data. In traditional nonparametric mixture models, observations are usually assumed exchangeable, even though dependence often exists associated with the space or time at which data are generated.
Focused on model-based clustering and segmentation, this thesis addresses the issue in different ways, for temporal and spatial dependence.
For sequential data analysis, the dynamic hierarchical Dirichlet process is proposed to capture the temporal dependence across different groups. The data collected at any time point are represented via a mixture associated with an appropriate underlying model; the statistical properties of data collected at consecutive time points are linked via a random parameter that controls their probabilistic similarity. The new model favors a smooth evolutionary clustering while allowing innovative patterns to be inferred. Experimental analysis is performed on music, and may also be employed on text data for learning topics.
Spatially dependent data is more challenging to model due to its spatially-grid structure and often large computational cost of analysis. As a non-parametric clustering prior, the logistic stick-breaking process introduced here imposes the belief that proximate data are more likely to be clustered together. Multiple logistic regression functions generate a set of sticks with each dominating a spatially localized segment. The proposed model is employed on image segmentation and speaker diarization, yielding generally homogeneous segments with sharp boundaries.
In addition, we also consider a multi-task learning with each task associated with spatial dependence. For the specific application of co-segmentation with multiple images, a hierarchical Bayesian model called H-LSBP is proposed. By sharing the same mixture atoms for different images, the model infers the inter-similarity between each pair of images, and hence can be employed for image sorting.
This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 United States License.
Rights for Collection: Duke Dissertations