BAYESIAN MODEL SEARCH AND MULTILEVEL INFERENCE FOR SNP ASSOCIATION STUDIES.
Abstract
Technological advances in genotyping have given rise to hypothesis-based association
studies of increasing scope. As a result, the scientific hypotheses addressed by these
studies have become more complex and more difficult to address using existing analytic
methodologies. Obstacles to analysis include inference in the face of multiple comparisons,
complications arising from correlations among the SNPs (single nucleotide polymorphisms),
choice of their genetic parametrization and missing data. In this paper we present
an efficient Bayesian model search strategy that searches over the space of genetic
markers and their genetic parametrization. The resulting method for Multilevel Inference
of SNP Associations, MISA, allows computation of multilevel posterior probabilities
and Bayes factors at the global, gene and SNP level, with the prior distribution on
SNP inclusion in the model providing an intrinsic multiplicity correction. We use
simulated data sets to characterize MISA's statistical power, and show that MISA has
higher power to detect association than standard procedures. Using data from the North
Carolina Ovarian Cancer Study (NCOCS), MISA identifies variants that were not identified
by standard methods and have been externally "validated" in independent studies. We
examine sensitivity of the NCOCS results to prior choice and method for imputing missing
data. MISA is available in an R package on CRAN.
Type
Journal articlePermalink
https://hdl.handle.net/10161/8405Collections
More Info
Show full item recordScholars@Duke
Merlise Clyde
Professor of Statistical Science
Model uncertainty and choice in prediction and variable selection problems for linear,
generalized linear models and multivariate models. Bayesian Model Averaging. Prior
distributions for model selection and model averaging. Wavelets and adaptive kernel
non-parametric function estimation. Spatial statistics. Experimental design for
nonlinear models. Applications in proteomics, bioinformatics, astro-statistics,
air pollution and health effects, and environmental sciences.
Edwin Severin Iversen Jr.
Research Professor of Statistical Science
Bayesian statistical modeling with application to problems in genetic epidemiology
and cancer research; models for epidemiological risk assessment, including hierarchical
methods for combining related epidemiological studies; ascertainment corrections for
high risk family data; analysis of high-throughput genomic data sets.
Joellen Martha Schildkraut
Professor Emeritus in Family Medicine and Community Health
Dr. Schildkraut is an epidemiologist whose research includes the molecular epidemiology
of ovarian, breast and brain cancers. Dr. Schildkraut's research interests include
the study of the interaction between genetic and environmental factors. She is currently
involved in a large study of genome wide association and ovarian cancer risk and survival.
Some of her work is also focused on particular genetic pathways including the DNA
repair and apoptosis pathways. She currently leads a study of
Scott C. Schmidler
Associate Professor of Statistical Science
Alphabetical list of authors with Scholars@Duke profiles.

Articles written by Duke faculty are made available through the campus open access policy. For more information see: Duke Open Access Policy
Rights for Collection: Scholarly Articles
Works are deposited here by their authors, and represent their research and opinions, not that of Duke University. Some materials and descriptions may include offensive content. More info