Functional Post-Clustering Selective Inference with Applications to EHR Data Analysis

dc.contributor.advisor

Reeves, Galen

dc.contributor.author

Zhu, Zihan

dc.date.accessioned

2024-06-06T13:50:10Z

dc.date.issued

2024

dc.department

Statistical Science

dc.description.abstract

In electronic health records (EHR) analysis, clustering patients according to patterns in their data is crucial for uncovering new subtypes of diseases. Existing medical literature often relies on classical hypothesis testing methods to test for differences in means between these clusters. Due to selection bias induced by clustering algorithms, the implementation of these classical methods on post-clustering data often leads to an inflated type-I error. In this paper, we introduce a new statistical approach that adjusts for this bias when analyzing data collected over time. Our method extends classical selective inference methods for cross-sectional data to longitudinal data. We provide theoretical guarantees for our approach with upper bounds on the selective type-I and type-II errors. Numerical experiments on simulated data verify our theory.

dc.identifier.uri

https://hdl.handle.net/10161/31044

dc.rights.uri

https://creativecommons.org/licenses/by-nc-nd/4.0/

dc.subject

Statistics

dc.title

Functional Post-Clustering Selective Inference with Applications to EHR Data Analysis

dc.type

Master's thesis

duke.embargo.months

12

duke.embargo.release

2025-06-06T13:50:10Z

Files

Collections