Two Applications of Summary Statistics: Integrating Information Across Genes and Confidence Intervals With Missing Data

Mandan, Arpita

Two Applications of Summary Statistics: Integrating Information Across Genes and Confidence Intervals With Missing Data

View / Download325.37 KB

Date

2018

Authors

Mandan, Arpita

Advisors

Mukherjee, Sayan

Repository Usage Stats

961
views

539
downloads

Abstract

Gene set enrichment methods are useful for the mapping of individual genes or proteins to pathways and signatures. We use this approach to study the expression levels of proteins encoded by different genes, and compare individuals that have Alzheimer’s disease (AD) to those that are cognitively normal (CN). Different gene sets might show differential enrichment in the two classes. A correlation statistic is computed for measuring the correlation of a sample to one class rather than to the other, with respect to a gene. This allows us to find the enrichment score for the sample with respect to an entire gene set, and to analyze the gene sets that are differentially expressed in the two classes. The linear model is a powerful tool that we use to estimate the correlation statistic, thus accounting for the class, and also the other covariates such as age and sex of the individual.

We study the Jeffreys and Clopper-Pearson intervals for binomial proportions when we have missing data. We use multiple imputation (MI) to deal with missing data. Using simulation studies, we compare the MI Wilson, MI Clopper-Pearson, and the MI Jefferys intervals. We then show that the MI Wilson interval has better repeated sampling properties among all in the case of high missingness. In the case of low missingness, the MI Wilson and MI Clopper-Pearson produce similar empirical coverage rates that are close to the nominal coverage. For a very low value of the binomial proportion, the Jeffreys interval has the largest coverage with the smallest average interval length.

Type

Master's thesis

Department

Statistical Science

Subjects

Statistics

Permalink

https://hdl.handle.net/10161/17533

Citation

Mandan, Arpita (2018). Two Applications of Summary Statistics: Integrating Information Across Genes and Confidence Intervals With Missing Data. Master's thesis, Duke University. Retrieved from https://hdl.handle.net/10161/17533.

Collections

Masters Theses

Full item page

Except where otherwise noted, student scholarship that was shared on DukeSpace after 2009 is made available to the public under a Creative Commons Attribution / Non-commercial / No derivatives (CC-BY-NC-ND) license. All rights in student work shared on DukeSpace before 2009 remain with the author and/or their designee, whose permission may be required for reuse.

Two Applications of Summary Statistics: Integrating Information Across Genes and Confidence Intervals With Missing Data

Date

Authors

Advisors

Journal Title

Journal ISSN

Volume Title

Repository Usage Stats

Abstract

Type

Department

Description

Provenance

Subjects

Citation

Permalink

Citation

Collections