Efficient Gaussian process regression for large datasets.
Abstract
Gaussian processes are widely used in nonparametric regression, classification and
spatiotemporal modelling, facilitated in part by a rich literature on their theoretical
properties. However, one of their practical limitations is expensive computation,
typically on the order of n(3) where n is the number of data points, in performing
the necessary matrix inversions. For large datasets, storage and processing also lead
to computational bottlenecks, and numerical stability of the estimates and predicted
values degrades with increasing n. Various methods have been proposed to address these
problems, including predictive processes in spatial data analysis and the subset-of-regressors
technique in machine learning. The idea underlying these approaches is to use a subset
of the data, but this raises questions concerning sensitivity to the choice of subset
and limitations in estimating fine-scale structure in regions that are not well covered
by the subset. Motivated by the literature on compressive sensing, we propose an alternative
approach that involves linear projection of all the data points onto a lower-dimensional
subspace. We demonstrate the superiority of this approach from a theoretical perspective
and through simulated and real data examples.
Type
Journal articleSubject
Bayesian regressionCompressive sensing
Dimensionality reduction
Gaussian process
Random projection
Permalink
https://hdl.handle.net/10161/15591Published Version (Please cite this version)
10.1093/biomet/ass068Publication Info
Banerjee, Anjishnu; Dunson, David B; & Tokdar, Surya T (2013). Efficient Gaussian process regression for large datasets. Biometrika, 100(1). pp. 75-89. 10.1093/biomet/ass068. Retrieved from https://hdl.handle.net/10161/15591.This is constructed from limited available data and may be imprecise. To cite this
article, please review & use the official citation provided by the journal.
Collections
More Info
Show full item recordScholars@Duke
David B. Dunson
Arts and Sciences Distinguished Professor of Statistical Science
My research focuses on developing new tools for probabilistic learning from complex
data - methods development is directly motivated by challenging applications in ecology/biodiversity,
neuroscience, environmental health, criminal justice/fairness, and more. We seek
to develop new modeling frameworks, algorithms and corresponding code that can be
used routinely by scientists and decision makers. We are also interested in new inference
framework and in studying theoretical properties
Surya Tapas Tokdar
Professor of Statistical Science
Alphabetical list of authors with Scholars@Duke profiles.

Articles written by Duke faculty are made available through the campus open access policy. For more information see: Duke Open Access Policy
Rights for Collection: Scholarly Articles
Works are deposited here by their authors, and represent their research and opinions, not that of Duke University. Some materials and descriptions may include offensive content. More info