Browsing by Author "Reeves, Galen"
Results Per Page
Sort Options
Item Embargo Functional Post-Clustering Selective Inference with Applications to EHR Data Analysis(2024) Zhu, ZihanIn electronic health records (EHR) analysis, clustering patients according to patterns in their data is crucial for uncovering new subtypes of diseases. Existing medical literature often relies on classical hypothesis testing methods to test for differences in means between these clusters. Due to selection bias induced by clustering algorithms, the implementation of these classical methods on post-clustering data often leads to an inflated type-I error. In this paper, we introduce a new statistical approach that adjusts for this bias when analyzing data collected over time. Our method extends classical selective inference methods for cross-sectional data to longitudinal data. We provide theoretical guarantees for our approach with upper bounds on the selective type-I and type-II errors. Numerical experiments on simulated data verify our theory.
Item Open Access FUNDAMENTAL LIMITS FOR COMMUNITY DETECTION IN LABELLED NETWORKS(2020) Mayya, Vaishakhi SathishThe problem of detecting the community structure of networks as well as closely related problems involving low-rank matrix factorization arise in applications throughout science and engineering. This dissertation focuses on the the fundamental limits of detection and recovery associated with a broad class of probabilistic network models, that includes the stochastic block model with labeled-edges. The main theoretical results are formulas that describe the asymptotically exact limits of the mutual information and reconstruction error. The formulas are described in terms of low-dimensional estimation problems in additive Gaussian noise.
The analysis builds upon a number of recent theoretical advances at the interface of information theory, random matrix theory, and statistical physics, including concepts such as channel universality and interpolation methods. The theoretical formulas provide insight into the ability to recover the community structure in the network. The analysis is supported by numerical simulations. Numerical simulations for different network models show that the observed performance closely follows the performance predicted by the formulas.
Item Open Access Numerical Approximation of Gaussian-Smoothed Optimal Transport(2022) Yang, CongweiThe Optimal Transport (OT) Distance, especially the Wasserstein distance, has important applications in statistics and machine learning. Though the optimal transport distance possesses many favorable properties, it is not widely applicable, especially in high dimensions, due to its computational cost and the "curse of dimensionality". In the past few years, the Sinkhorn Algorithm [Cuturi, 2013], which uses entropy regularization to relieve the computational burden, provides an efficient approximation of the optimal transport distances. Moreover, the recently proposed Gaussian-Smoothed Optimal Transport (GOT) framework by [Goldfeld and Greenewald, 2020] provides potential solution to alleviate the "curse of dimensionality". Furthermore, [Makkuva et al., 2020] proposed a new algorithm that uses the Input Convex Neural Network (ICNN) to represent the optimal transport map with the gradient of convex functions. Inspired by previous works, we addressed the characteristics of different approximation algorithms for Optimal Transport distances and proposed a multiple sampling scheme under the Gaussian-Smoothed Optimal Transport framework. The simulation study shows that the multiple sampling essentially leads to better representation of Gaussian smoothness, and thus provides more accurate approximation, especially in high dimensions. Finally, we proposed a derivation that transforms 2-Wasserstein distance into the mean-width of a convex hull under a specific pair of distribution classes, and thus allows the analytical computation of 2-Wasserstein distances. We further verified this analytical result by Monte-Carlo simulation.
Item Open Access Tailored Scalable Dimensionality Reduction(2018) van den Boom, WillemAlthough there is a rich literature on scalable methods for dimensionality reduction, the focus has been on widely applicable approaches which, in certain applications, are far from optimal or not even applicable. Dimensionality reduction can improve scalability of Bayesian computation, but optimal performance needs tailoring to the model. What kind of dimensionality reduction is sensible in data applications varies by the context of the data, resulting in neglect of information contained in data that do not fit general approaches.
This dissertation introduces dimensionality reduction methods tailored to specific computational or data applications. Firstly, we scale up posterior computation in Bayesian linear regression using a dimensionality reduction approach enabled by the linearity in the model. It approximately integrates out nuisance parameters from a high-dimensional likelihood. The resulting posterior approximation scheme is competitive with state-of-the-art scalable posterior inference methods while being easier to interpret, understand, and analyze due to the explicit use of dimensionality reduction. Bayesian variable selection is considered as an example of a challenging posterior where the dimensionality reduction speeds up computation greatly and accurately.
Secondly, we show how to reduce dimensionality based on data context in varying-domain functional data, where existing methods do not apply. The data of interest are intraoperative blood pressure and heart rate measurements. The first proposed approach extracts multiple different low-dimensional features from the high-dimensional blood pressure data, which are partly predefined and partly learnt from the data. This yields insights regarding blood pressure variability new to the clinical literature since such detailed inference was not possible with existing methods. The concluding case of dimensionality reduction is quantifying coupling of blood pressure and heart rate. This reduces two time series to one measurement of the strength of coupling. The results show the utility for inference methods of dimensionality reduction that is tailored to the challenge at hand.