Efficient and Scalable Markov Chain Monte Carlo Methods and its Biological Applications

dc.contributor.advisor

Carin, Lawrence

dc.contributor.author

Zhang, Yizhe

dc.date.accessioned

2018-05-31T21:12:06Z

dc.date.issued

2018

dc.department

Computational Biology and Bioinformatics

dc.description.abstract

Markov Chain Monte Carlo (MCMC) stands as a fundamental approach for probabilistic inference in many computational statistics problems. Its application to computational biology and bioinformatics has attracted much attention in recent decades.

A pivot question in MCMC is to design methods to efficiently draw samples from an unnormalized density function. Two auxiliary-variable sampling schemes, Hamiltonian Monte Carlo (HMC) and the slice sampler, have been introduced for tackling this challenge. Despite the great success of these two methods, little research has been done to investigate their connections,

as well as their sampling efficiency.

This thesis first focus on the theoretical connection (chapter 3), the unification and generalization of slice sampling and HMC. Base on these theoretical analysis, I present a generalized HMC that demonstrate efficient exploration of target distribution, especially when the target distribution has multiple modes. The advantage over vanilla HMC is verified theoretically and experimentally. Furthermore, I discussed the tradeoff between mixing efficiency and potential issues of this generalized HMC method. The advances also include potential extensions on utilizing geometric information and higher order numerical integration for better performance.

The second part of the thesis, presented in chapter 4, concerns some advances remedying the practical issues of the generalized sampler, and how to scale up with large datasets. Chapter 4 first develops a novel scalable approximate sampling approach based on the generalized HMC method proposed in chapter 3 and stochastic gradient sampling methods. This is followed by empirical verification that such an approach can deliver better exploration over complicated multimodal posterior regardless of lack of conjugacy.

The remaining part of this thesis, consisting chapter 5 and chapter 6, discuss advances of scalable Bayesian method for some generic and core Biomedical applications. Two Bayesian inferential tasks involving latent variable model are discussed. Chapter 5 focuses on applying Bayesian inference for discrete time-series biological data. Chapter 6 concerns a non-linear latent topic model with supervised label. These approaches exemplifies the Bayesian inference method in chapter 4, and demonstrate some innovations on maintaining accurate and scalable inference while facilitating model interpretability.

Finally, chapter 7 concludes the dissertation and discussion some potential future studies in both methodology and applications.

dc.identifier.uri

https://hdl.handle.net/10161/16793

dc.subject

Statistics

dc.subject

Bioinformatics

dc.subject

Computer science

dc.title

Efficient and Scalable Markov Chain Monte Carlo Methods and its Biological Applications

dc.type

Dissertation

dcterms.provenance

Embargo released early at request of author--mjf33 2019-01-08

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Zhang_duke_0066D_14392.pdf
Size:
4.65 MB
Format:
Adobe Portable Document Format

Collections