dc.description.abstract |
<p>The main contribution of this thesis is to examine the utility of kernel regression
ap- proaches and variance component models for solving complex problems in statistical
genetics and molecular biology. Many of these types of statistical methods have been
developed specifically to be applied to solve similar biological problems. For example,
kernel regression models have a long history in statistics, applied mathematics, and
machine learning. More recently, variance component models have been extensively utilized
as tools to broaden understanding of the genetic basis of phenotypic varia- tion.
However, because of large combinatorial search spaces and other confounding factors,
many of these current methods face enormous computational challenges and often suffer
from low statistical power --- particularly when phenotypic variation is driven by
complicated underlying genetic architectures (e.g. the presence of epistatic effects
involving higher order genetic interactions). This thesis highlights two novel methods
which provide innovative solutions to better address the important statis- tical and
computational hurdles faced within complex biological data sets. The first is a Bayesian
non-parametric statistical framework that allows for efficient variable selection
in nonlinear regression which we refer to as "Bayesian approximate kernel regression",
or BAKR. The second is a novel algorithm for identifying genetic vari- ants that are
involved in epistasis without the need to identify the exact partners with which the
variants interact. We refer to this method as the "MArginal ePIstasis Test", or MAPIT.
Here, we develop the theory of these two approaches, and demonstrate their power,
interpretability, and computational efficiency for analyz- ing complex phenotypes.
We also illustrate their ability to facilitate novel biological discoveries in several
real data sets, each of them representing a particular class of analyses: genome-wide
association studies (GWASs), molecular trait quantitative trait loci (QTL) mapping
studies, and cancer biology association studies. Lastly, we will also explore the
potential of these approaches in radiogenomics, a brand new subfield of genetics and
genomics that focuses on the study of correlations between imaging or network features
and genetic variation.</p>
|
|