Bayesian Kernel Models for Statistical Genetics and Cancer Genomics

Thumbnail Image




Crawford, Lorin Anthony


Mukherjee, Sayan

Journal Title

Journal ISSN

Volume Title

Repository Usage Stats



The main contribution of this thesis is to examine the utility of kernel regression ap- proaches and variance component models for solving complex problems in statistical genetics and molecular biology. Many of these types of statistical methods have been developed specifically to be applied to solve similar biological problems. For example, kernel regression models have a long history in statistics, applied mathematics, and machine learning. More recently, variance component models have been extensively utilized as tools to broaden understanding of the genetic basis of phenotypic varia- tion. However, because of large combinatorial search spaces and other confounding factors, many of these current methods face enormous computational challenges and often suffer from low statistical power --- particularly when phenotypic variation is driven by complicated underlying genetic architectures (e.g. the presence of epistatic effects involving higher order genetic interactions). This thesis highlights two novel methods which provide innovative solutions to better address the important statis- tical and computational hurdles faced within complex biological data sets. The first is a Bayesian non-parametric statistical framework that allows for efficient variable selection in nonlinear regression which we refer to as "Bayesian approximate kernel regression", or BAKR. The second is a novel algorithm for identifying genetic vari- ants that are involved in epistasis without the need to identify the exact partners with which the variants interact. We refer to this method as the "MArginal ePIstasis Test", or MAPIT. Here, we develop the theory of these two approaches, and demonstrate their power, interpretability, and computational efficiency for analyz- ing complex phenotypes. We also illustrate their ability to facilitate novel biological discoveries in several real data sets, each of them representing a particular class of analyses: genome-wide association studies (GWASs), molecular trait quantitative trait loci (QTL) mapping studies, and cancer biology association studies. Lastly, we will also explore the potential of these approaches in radiogenomics, a brand new subfield of genetics and genomics that focuses on the study of correlations between imaging or network features and genetic variation.





Crawford, Lorin Anthony (2017). Bayesian Kernel Models for Statistical Genetics and Cancer Genomics. Dissertation, Duke University. Retrieved from


Dukes student scholarship is made available to the public using a Creative Commons Attribution / Non-commercial / No derivative (CC-BY-NC-ND) license.