Efficient and Scalable Markov Chain Monte Carlo Methods and its Biological Applications

Loading...

Date

2018

Journal Title

Journal ISSN

Volume Title

Repository Usage Stats

183
views
275
downloads

Abstract

Markov Chain Monte Carlo (MCMC) stands as a fundamental approach for probabilistic inference in many computational statistics problems. Its application to computational biology and bioinformatics has attracted much attention in recent decades.

A pivot question in MCMC is to design methods to efficiently draw samples from an unnormalized density function. Two auxiliary-variable sampling schemes, Hamiltonian Monte Carlo (HMC) and the slice sampler, have been introduced for tackling this challenge. Despite the great success of these two methods, little research has been done to investigate their connections,

as well as their sampling efficiency.

This thesis first focus on the theoretical connection (chapter 3), the unification and generalization of slice sampling and HMC. Base on these theoretical analysis, I present a generalized HMC that demonstrate efficient exploration of target distribution, especially when the target distribution has multiple modes. The advantage over vanilla HMC is verified theoretically and experimentally. Furthermore, I discussed the tradeoff between mixing efficiency and potential issues of this generalized HMC method. The advances also include potential extensions on utilizing geometric information and higher order numerical integration for better performance.

The second part of the thesis, presented in chapter 4, concerns some advances remedying the practical issues of the generalized sampler, and how to scale up with large datasets. Chapter 4 first develops a novel scalable approximate sampling approach based on the generalized HMC method proposed in chapter 3 and stochastic gradient sampling methods. This is followed by empirical verification that such an approach can deliver better exploration over complicated multimodal posterior regardless of lack of conjugacy.

The remaining part of this thesis, consisting chapter 5 and chapter 6, discuss advances of scalable Bayesian method for some generic and core Biomedical applications. Two Bayesian inferential tasks involving latent variable model are discussed. Chapter 5 focuses on applying Bayesian inference for discrete time-series biological data. Chapter 6 concerns a non-linear latent topic model with supervised label. These approaches exemplifies the Bayesian inference method in chapter 4, and demonstrate some innovations on maintaining accurate and scalable inference while facilitating model interpretability.

Finally, chapter 7 concludes the dissertation and discussion some potential future studies in both methodology and applications.

Description

Provenance

Embargo released early at request of author--mjf33 2019-01-08

Subjects

Statistics, Bioinformatics, Computer science

Citation

Citation

Zhang, Yizhe (2018). Efficient and Scalable Markov Chain Monte Carlo Methods and its Biological Applications. Dissertation, Duke University. Retrieved from https://hdl.handle.net/10161/16793.

Collections


Except where otherwise noted, student scholarship that was shared on DukeSpace after 2009 is made available to the public under a Creative Commons Attribution / Non-commercial / No derivatives (CC-BY-NC-ND) license. All rights in student work shared on DukeSpace before 2009 remain with the author and/or their designee, whose permission may be required for reuse.