Multiple Testing for Data with Ancillary Information

dc.contributor.advisor

Xie, Jichun

dc.contributor.author

Li, Xuechan

dc.date.accessioned

2022-06-15T18:44:07Z

dc.date.available

2023-05-26T08:17:14Z

dc.date.issued

2022

dc.department

Biostatistics and Bioinformatics Doctor of Philosophy

dc.description.abstract

In my dissertation, I develop three powerful hierarchical multiple testing methods by accounting for ancillary information of data. In my first project, we develop a multiple testing framework named Distance Assisted Recursive Testing (DART). DART assumes there exists some informative distance information in the data. Through rigorous proof and extensive simulations, we justified the false discovery rate (FDR) control and sensitivity improvement of DART. As an illustration, we apply our method to a clinical trial in leukemia patients receiving hematopoietic cell transplantation to identify the gut microbiota whose abundance will be impacted by the after-transplant care. The second project is motivated by the flow cytometry analysis in immunology study. The analysis can be translated into a statistical problem which is trying to pinpoint the regions where two density functions differ. By partitioning the sample space into small bins and conducting testing on each bin, we model the analysis into a multiple testing problem. We provide theoretical justification that the procedure achieves the statistical goal of pinpointing the regions with differential density with high sensitivity and precision. My third project is motivated by the rare variant association study. We develop a multiple testing framework named DATED (Dynamic Aggregation and Tree-Embedded testing) to pinpoint the disease-associated rare-variant regions hierarchically and dynamically. To accommodate the application objective, DATED adopts a rare variant region-level FDR weighted by the proportions of the neutral rare-variant. Extensive numerical simulations demonstrate the superior performance of DATED under various scenarios compared to the existing methods. We illustrate DATED by applying it to an amyotrophic lateral sclerosis (ALS) study for identifying pathogenic rare variants.

dc.identifier.uri

https://hdl.handle.net/10161/25262

dc.subject

Biostatistics

dc.title

Multiple Testing for Data with Ancillary Information

dc.type

Dissertation

duke.embargo.months

11.342465753424657

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Li_duke_0066D_16740.pdf
Size:
1.57 MB
Format:
Adobe Portable Document Format

Collections