Scalable Bayesian Inference and Multiple Hypothesis Testing for Tree-structured Data

Loading...

Date

2025

Journal Title

Journal ISSN

Volume Title

Repository Usage Stats

11
views
18
downloads

Attention Stats

Abstract

Bayesian computational algorithms tend to scale poorly as data size increases, partic- ularly for dependent data such as long time series. This has motivated the development of scalable inference methods, including divide-and-conquer and sub-sampling-based ap- proaches. We study the problem of Bayesian inference for time series, where the literature predominantly focuses on approximate methods that often lack rigorous theoretical guar- antees and may result in poor practical accuracy. To address this, we propose a simple and scalable divide-and-conquer method for long time series, with provable accuracy guarantees.

In addition, we address the computational inefficiency of Markov chain Monte Carlo (MCMC) algorithms for hidden Markov models, which often rely on the forward-backward sampler and become slow with increasing time series length. We develop a targeted sub- sampling (TASS) approach that over-samples observations corresponding to rare latent states when estimating gradients in stochastic gradient MCMC. TASS improves sampling efficiency by reducing variance in gradient estimation, especially when rare states corre- spond to extreme observations. Real and synthetic data demonstrate substantial gains in predictive and inferential accuracy.

We further propose a decision framework for testing multiple hypotheses with a nat- ural tree structure, common in multiscale inference problems. We model dependence in hypothesis probabilities using a hidden Markov tree model (HMTM) and develop an or- acle procedure that minimizes the false non-discovery rate (FNR) under a false discovery rate (FDR) constraint. A data-driven procedure is introduced and shown to be asymptot- ically equivalent to the oracle. Motivated by human brain connectome data, we apply the framework to a one-sided two-sample testing problem for heterogeneous count data, and demonstrate its effectiveness through simulations.

Description

Provenance

Subjects

Statistics

Citation

Citation

Ou, Rihui (2025). Scalable Bayesian Inference and Multiple Hypothesis Testing for Tree-structured Data. Dissertation, Duke University. Retrieved from https://hdl.handle.net/10161/33395.

Collections


Except where otherwise noted, student scholarship that was shared on DukeSpace after 2009 is made available to the public under a Creative Commons Attribution / Non-commercial / No derivatives (CC-BY-NC-ND) license. All rights in student work shared on DukeSpace before 2009 remain with the author and/or their designee, whose permission may be required for reuse.