Show simple item record

Multiple Testing Embedded in an Aggregation Tree With Applications to Omics Data

dc.contributor.advisor Xie, Jichun
dc.contributor.author Pura, John
dc.date.accessioned 2020-09-18T16:00:03Z
dc.date.available 2021-09-02T08:17:09Z
dc.date.issued 2020
dc.identifier.uri https://hdl.handle.net/10161/21453
dc.description Dissertation
dc.description.abstract <p>In my dissertation, I have developed computational methods for high dimensional inference, motivated by the analysis of omics data. This dissertation is divided into two parts. The first part of this dissertation is motivated by flow cytometry data analysis, where a key goal is to identify sparse cell subpopulations that differ be- tween two groups. I have developed an algorithm called multiple Testing Embedded on an Aggregation tree Method (TEAM) to locate where distributions differ between two samples. Regions containing differences can be identified in layers along the tree: the first layer searches for regions containing short-range, strong distributional differences, and higher layers search for regions containing long-range, weak distributional differences. TEAM is able to pinpoint local differences and under mild assumptions, asymptotically control the layer-specific and overall false discovery rate (FDR). Simulations verify our theoretical results. When applied to real flow cytometry data, TEAM captures cell subtypes that are overexpressed in cytomegalovirus stimulation vs. control. In addition, I have extended the TEAM algorithm so that it can incorporate information from more than one cell attribute, allowing for more robust conclusions. The second part of this dissertation is motivated by rare variant association studies, where a key goal is to identify regions of rare variants, which are associated with disease. This problem is addressed via a flexible method called stochastic aggregation tree-embedded testing (SATET). SATET embeds testing of genomic regions onto an aggregation tree, which provides a way to test association at various resolutions. The rejection rule at each layer depends on the previous layer, and leads to a procedure that controls the layer-specific FDR. Compared to methods that search for rare-variant association over large regions, such as protein domains, SATET can pinpoint sub-genic regions associated with disease. Numerical experiments show FDR control for different genetic architectures and superior per- formance compared to domain-based analyses. When applied to a case-control study in amyotrophic lateral sclerosis (ALS), SATET identified sub-genic regions in known ALS-related genes, while implicating regions in new genes not previously captured by domain-based analyses.</p>
dc.subject Biostatistics
dc.subject Bioinformatics
dc.subject Genetics
dc.subject aggregation tree
dc.subject false discovery proportion
dc.subject flow cytometry
dc.subject multiple testing
dc.subject rare variant association
dc.title Multiple Testing Embedded in an Aggregation Tree With Applications to Omics Data
dc.type Dissertation
dc.department Biostatistics and Bioinformatics Doctor of Philosophy
duke.embargo.months 11.441095890410958


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record