Integrative Genomic Modeling of Complex Traits using Pathway Analysis
Understanding the root molecular causes driving complex traits is a fundamental challenge in genomics and genetics. Numerous studies have used variation in gene expression to understand complex traits, but the underlying genomic variation that contributes to these expression changes is not well understood. The overall goal of this work is to develop an integrative framework to better understand the genetic and molecular causes of complex traits, including complex diseases. In this work, I present a computational framework that I developed to integrate gene expression and other genomic data to identify biological differences between samples from opposing complex trait classes that are driven by expression changes and genomic variation. This framework combines analysis on the multi-gene biological pathway level with multi-task learning to build predictive models that also uncover pathways potentially relevant to the complex trait of interest. To validate this framework, I first performed a simulation study to test its predictive ability and to measure how well it uncovered pathways that contain genes that are both differentially expressed and genetically associated with a complex trait. The predictive performance of the multi-task model was found to be comparable to other similar methods. Also, multi-task learning, along with other methods that jointly considered pathway scores from both data sets, was able to better identify pathways with both genetic and expression differences related to the phenotype. I applied this framework to gene expression and genotype data from estrogen receptor (ER) positive and ER negative breast cancer samples. The top 15 predictive pathways from the multi-task model were all related to estrogen, steroids, cell signaling, or the cell cycle. The results from both the simulation studies and the breast cancer analysis suggest that this multi-task framework is useful for both identifying biologically relevant pathways associated with a phenotype across multiple data types while also retaining similar predictive performance as other similar methods.
This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 United States License.
Rights for Collection: Duke Dissertations