Two Applications of Summary Statistics: Integrating Information Across Genes and Confidence Intervals With Missing Data

dc.contributor.advisor

Mukherjee, Sayan

dc.contributor.author

Mandan, Arpita

dc.date.accessioned

2018-09-21T16:16:50Z

dc.date.available

2018-09-21T16:16:50Z

dc.date.issued

2018

dc.department

Statistical Science

dc.description.abstract

Gene set enrichment methods are useful for the mapping of individual genes or proteins to pathways and signatures. We use this approach to study the expression levels of proteins encoded by different genes, and compare individuals that have Alzheimer’s disease (AD) to those that are cognitively normal (CN). Different gene sets might show differential enrichment in the two classes. A correlation statistic is computed for measuring the correlation of a sample to one class rather than to the other, with respect to a gene. This allows us to find the enrichment score for the sample with respect to an entire gene set, and to analyze the gene sets that are differentially expressed in the two classes. The linear model is a powerful tool that we use to estimate the correlation statistic, thus accounting for the class, and also the other covariates such as age and sex of the individual.

We study the Jeffreys and Clopper-Pearson intervals for binomial proportions when we have missing data. We use multiple imputation (MI) to deal with missing data. Using simulation studies, we compare the MI Wilson, MI Clopper-Pearson, and the MI Jefferys intervals. We then show that the MI Wilson interval has better repeated sampling properties among all in the case of high missingness. In the case of low missingness, the MI Wilson and MI Clopper-Pearson produce similar empirical coverage rates that are close to the nominal coverage. For a very low value of the binomial proportion, the Jeffreys interval has the largest coverage with the smallest average interval length.

dc.identifier.uri

https://hdl.handle.net/10161/17533

dc.subject

Statistics

dc.title

Two Applications of Summary Statistics: Integrating Information Across Genes and Confidence Intervals With Missing Data

dc.type

Master's thesis

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Mandan_duke_0066N_14834.pdf
Size:
325.37 KB
Format:
Adobe Portable Document Format

Collections