Bayesian Nonparametric Modeling of Latent Structures

dc.contributor.advisor

Carin, Lawrence

dc.contributor.author

Xing, Zhengming

dc.date.accessioned

2015-05-12T20:43:36Z

dc.date.available

2017-02-09T05:30:03Z

dc.date.issued

2014

dc.department

Electrical and Computer Engineering

dc.description.abstract

Unprecedented amount of data has been collected in diverse fields such as social network, infectious disease and political science in this information explosive era. The high dimensional, complex and heterogeneous data imposes tremendous challenges on traditional statistical models. Bayesian nonparametric methods address these challenges by providing models that can fit the data with growing complexity. In this thesis, we design novel Bayesian nonparametric models on dataset from three different fields, hyperspectral images analysis, infectious disease and voting behaviors.

First, we consider analysis of noisy and incomplete hyperspectral imagery, with the objective of removing the noise and inferring the missing data. The noise statistics may be wavelength-dependent, and the fraction of data missing (at random) may be substantial, including potentially entire bands, offering the potential to significantly reduce the quantity of data that need be measured. We achieve this objective by employing Bayesian dictionary learning model, considering two distinct means of imposing sparse dictionary usage and drawing the dictionary elements from a Gaussian process prior, imposing structure on the wavelength dependence of the dictionary elements.

Second, a Bayesian statistical model is developed for analysis of the time-evolving properties of infectious disease, with a particular focus on viruses. The model employs a latent semi-Markovian state process, and the state-transition statistics are driven by three terms: ($i$) a general time-evolving trend of the overall population, ($ii$) a semi-periodic term that accounts for effects caused by the days of the week, and ($iii$) a regression term that relates the probability of infection to covariates (here, specifically, to the Google Flu Trends data).

Third, extensive information on 3 million randomly sampled United States citizens is used to construct a statistical model of constituent preferences for each U.S. congressional district. This model is linked to the legislative voting record of the legislator from each district, yielding an integrated model for constituency data, legislative roll-call votes, and the text of the legislation. The model is used to examine the extent to which legislators' voting records are aligned with constituent preferences, and the implications of that alignment (or lack thereof) on subsequent election outcomes. The analysis is based on a Bayesian nonparametric formalism, with fast inference via a stochastic variational Bayesian analysis.

dc.identifier.uri

https://hdl.handle.net/10161/9792

dc.subject

Electrical engineering

dc.subject

Computer engineering

dc.subject

Statistics

dc.subject

Bayesian nonparametrics

dc.subject

Hyperspectral image

dc.subject

Infectious disease

dc.subject

semi-Markov

dc.subject

voting behavior

dc.title

Bayesian Nonparametric Modeling of Latent Structures

dc.type

Dissertation

duke.embargo.months

21

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Xing_duke_0066D_12697.pdf
Size:
4.41 MB
Format:
Adobe Portable Document Format

Collections