Bayesian variable selection in structured high-dimensional covariate spaces with applications in genomics
Abstract
We consider the problem of variable selection in regression modeling in high-dimensional
spaces where there is known structure among the covariates. This is an unconventional
variable selection problem for two reasons: (1) The dimension of the covariate space
is comparable, and often much larger, than the number of subjects in the study, and
(2) the covariate space is highly structured, and in some cases it is desirable to
incorporate this structural information in to the model building process. We approach
this problem through the Bayesian variable selection framework, where we assume that
the covariates lie on an undirected graph and formulate an Ising prior on the model
space for incorporating structural information. Certain computational and statistical
problems arise that are unique to such high-dimensional, structured settings, the
most interesting being the phenomenon of phase transitions. We propose theoretical
and computational schemes to mitigate these problems. We illustrate our methods on
two different graph structures: the linear chain and the regular graph of degree k.
Finally, we use our methods to study a specific application in genomics: the modeling
of transcription factor binding sites in DNA sequences. © 2010 American Statistical
Association.
Type
Journal articlePermalink
https://hdl.handle.net/10161/4400Published Version (Please cite this version)
10.1198/jasa.2010.tm08177Publication Info
Li, F; & Zhang, NR (2010). Bayesian variable selection in structured high-dimensional covariate spaces with applications
in genomics. Journal of the American Statistical Association, 105(491). pp. 1202-1214. 10.1198/jasa.2010.tm08177. Retrieved from https://hdl.handle.net/10161/4400.This is constructed from limited available data and may be imprecise. To cite this
article, please review & use the official citation provided by the journal.
Collections
More Info
Show full item recordScholars@Duke
Fan Li
Professor of Statistical Science
My main research interest is causal inference and its applications to health, policy
and social science. I also work on the interface between causal inference and machine
learning. I have developed methods for propensity score, clinical trials, randomized
experiments (e.g. A/B testing), difference-in-differences, regression discontinuity
designs, representation learning. I also work on Bayesian analysis and statistical
methods for missing data. I am serving as the editor for social science, bios

Articles written by Duke faculty are made available through the campus open access policy. For more information see: Duke Open Access Policy
Rights for Collection: Scholarly Articles
Works are deposited here by their authors, and represent their research and opinions, not that of Duke University. Some materials and descriptions may include offensive content. More info