Bayesian Kernel Models for Statistical Genetics and Cancer Genomics
Date
2017
Authors
Advisors
Journal Title
Journal ISSN
Volume Title
Repository Usage Stats
views
downloads
Abstract
The main contribution of this thesis is to examine the utility of kernel regression ap- proaches and variance component models for solving complex problems in statistical genetics and molecular biology. Many of these types of statistical methods have been developed specifically to be applied to solve similar biological problems. For example, kernel regression models have a long history in statistics, applied mathematics, and machine learning. More recently, variance component models have been extensively utilized as tools to broaden understanding of the genetic basis of phenotypic varia- tion. However, because of large combinatorial search spaces and other confounding factors, many of these current methods face enormous computational challenges and often suffer from low statistical power --- particularly when phenotypic variation is driven by complicated underlying genetic architectures (e.g. the presence of epistatic effects involving higher order genetic interactions). This thesis highlights two novel methods which provide innovative solutions to better address the important statis- tical and computational hurdles faced within complex biological data sets. The first is a Bayesian non-parametric statistical framework that allows for efficient variable selection in nonlinear regression which we refer to as "Bayesian approximate kernel regression", or BAKR. The second is a novel algorithm for identifying genetic vari- ants that are involved in epistasis without the need to identify the exact partners with which the variants interact. We refer to this method as the "MArginal ePIstasis Test", or MAPIT. Here, we develop the theory of these two approaches, and demonstrate their power, interpretability, and computational efficiency for analyz- ing complex phenotypes. We also illustrate their ability to facilitate novel biological discoveries in several real data sets, each of them representing a particular class of analyses: genome-wide association studies (GWASs), molecular trait quantitative trait loci (QTL) mapping studies, and cancer biology association studies. Lastly, we will also explore the potential of these approaches in radiogenomics, a brand new subfield of genetics and genomics that focuses on the study of correlations between imaging or network features and genetic variation.
Type
Department
Description
Provenance
Citation
Permalink
Citation
Crawford, Lorin Anthony (2017). Bayesian Kernel Models for Statistical Genetics and Cancer Genomics. Dissertation, Duke University. Retrieved from https://hdl.handle.net/10161/14539.
Collections
Except where otherwise noted, student scholarship that was shared on DukeSpace after 2009 is made available to the public under a Creative Commons Attribution / Non-commercial / No derivatives (CC-BY-NC-ND) license. All rights in student work shared on DukeSpace before 2009 remain with the author and/or their designee, whose permission may be required for reuse.