Efficient analysis of complex, multimodal genomic data

Acharya, Chaitanya Ramanuj

Efficient analysis of complex, multimodal genomic data

View / Download869.62 KB

Date

2016

Authors

Acharya, Chaitanya Ramanuj

Advisors

Allen, Andrew S

Repository Usage Stats

326
views

344
downloads

Attention Stats

Abstract

Our primary goal is to better understand complex diseases using statistically disciplined approaches. As multi-modal data is streaming out of consortium projects like Genotype-Tissue Expression (GTEx) project, which aims at collecting samples from various tissue sites in order to understand tissue-specific gene regulation, new approaches are needed that can efficiently model groups of data with minimal loss of power. For example, GTEx project delivers RNA-Seq, Microarray gene expression and genotype data (SNP Arrays) from a vast number of tissues in a given individual subject. In order to analyze this type of multi-level (hierarchical) multi-modal data, we proposed a series of efficient-score based tests or score tests and leveraged groups of tissues or gene isoforms in order map genomic biomarkers. We model group-specific variability as a random effect within a mixed effects model framework. In one instance, we proposed a score-test based approach to map expression quantitative trait loci (eQTL) across multiple-tissues. In order to do that we jointly model all the tissues and make use of all the information available to maximize the power of eQTL mapping and investigate an overall shift in the gene expression combined with tissue-specific effects due to genetic variants. In the second instance, we showed the flexibility of our model framework by expanding it to include tissue-specific epigenetic data (DNA methylation) and map eQTL by leveraging both tissues and methylation. Finally, we also showed that our methods are applicable on different data type such as whole transcriptome expression data, which is designed to analyze genomic events such alternative gene splicing. In order to accomplish this, we proposed two different models that exploit gene expression data of all available gene-isoforms within a gene to map biomarkers of interest (either genes or gene-sets) in paired early-stage breast tumor samples before and after treatment with external beam radiation. Our efficient score-based approaches have very distinct advantages. They have a computational edge over existing methods because they do not need parameter estimation under the alternative hypothesis. As a result, model parameters only have to be estimated once per genome, significantly decreasing computation time. Also, the efficient score is the locally most powerful test and is guaranteed a theoretical optimality over all other approaches in a neighborhood of the null hypothesis. This theoretical performance is born out in extensive simulation studies which show that our approaches consistently outperform existing methods both in statistical power and computational speed. We applied our methods to publicly available datasets. It is important to note that all of our methods also accommodate the analysis of next-generation sequencing data.

Type

Dissertation

Department

Computational Biology and Bioinformatics

Subjects

Bioinformatics, Genetics, Biostatistics, DNA methylation, efficient score test, expression quantitative trait loci, genome wide study, multimodal genomic data, next-generation sequencing data

Permalink

https://hdl.handle.net/10161/13390

Citation

Acharya, Chaitanya Ramanuj (2016). Efficient analysis of complex, multimodal genomic data. Dissertation, Duke University. Retrieved from https://hdl.handle.net/10161/13390.

Collections

Dissertations

Full item page

Except where otherwise noted, student scholarship that was shared on DukeSpace after 2009 is made available to the public under a Creative Commons Attribution / Non-commercial / No derivatives (CC-BY-NC-ND) license. All rights in student work shared on DukeSpace before 2009 remain with the author and/or their designee, whose permission may be required for reuse.

System Maintenance: Friday, 5 December 2025, 9-11am US/Eastern

Efficient analysis of complex, multimodal genomic data

Date

Authors

Advisors

Journal Title

Journal ISSN

Volume Title

Repository Usage Stats

Attention Stats

Abstract

Type

Department

Description

Provenance

Subjects

Citation

Permalink

Citation

Collections