Bayesian variable selection in structured high-dimensional covariate spaces with applications in genomics
Date
2010-09-01
Journal Title
Journal ISSN
Volume Title
Repository Usage Stats
views
downloads
Citation Stats
Abstract
We consider the problem of variable selection in regression modeling in high-dimensional spaces where there is known structure among the covariates. This is an unconventional variable selection problem for two reasons: (1) The dimension of the covariate space is comparable, and often much larger, than the number of subjects in the study, and (2) the covariate space is highly structured, and in some cases it is desirable to incorporate this structural information in to the model building process. We approach this problem through the Bayesian variable selection framework, where we assume that the covariates lie on an undirected graph and formulate an Ising prior on the model space for incorporating structural information. Certain computational and statistical problems arise that are unique to such high-dimensional, structured settings, the most interesting being the phenomenon of phase transitions. We propose theoretical and computational schemes to mitigate these problems. We illustrate our methods on two different graph structures: the linear chain and the regular graph of degree k. Finally, we use our methods to study a specific application in genomics: the modeling of transcription factor binding sites in DNA sequences. © 2010 American Statistical Association.
Type
Department
Description
Provenance
Subjects
Citation
Permalink
Published Version (Please cite this version)
Publication Info
Li, F, and NR Zhang (2010). Bayesian variable selection in structured high-dimensional covariate spaces with applications in genomics. Journal of the American Statistical Association, 105(491). pp. 1202–1214. 10.1198/jasa.2010.tm08177 Retrieved from https://hdl.handle.net/10161/4400.
This is constructed from limited available data and may be imprecise. To cite this article, please review & use the official citation provided by the journal.
Collections
Scholars@Duke
Fan Li
My main research interest is causal inference and its applications to health, policy and social science. I also work on the interface between causal inference and machine learning. I have developed methods for propensity score, clinical trials, randomized experiments (e.g. A/B testing), difference-in-differences, regression discontinuity designs, representation learning. I also work on Bayesian analysis and statistical methods for missing data. I am serving as the editor for social science, biostatistics and policy for the journal Annals of Applied Statistics.
Unless otherwise indicated, scholarly articles published by Duke faculty members are made available here with a CC-BY-NC (Creative Commons Attribution Non-Commercial) license, as enabled by the Duke Open Access Policy. If you wish to use the materials in ways not already permitted under CC-BY-NC, please consult the copyright owner. Other materials are made available here through the author’s grant of a non-exclusive license to make their work openly accessible.