Efficient Gaussian process regression for large datasets.
Abstract
Gaussian processes are widely used in nonparametric regression, classification and
spatiotemporal modelling, facilitated in part by a rich literature on their theoretical
properties. However, one of their practical limitations is expensive computation,
typically on the order of n(3) where n is the number of data points, in performing
the necessary matrix inversions. For large datasets, storage and processing also lead
to computational bottlenecks, and numerical stability of the estimates and predicted
values degrades with increasing n. Various methods have been proposed to address these
problems, including predictive processes in spatial data analysis and the subset-of-regressors
technique in machine learning. The idea underlying these approaches is to use a subset
of the data, but this raises questions concerning sensitivity to the choice of subset
and limitations in estimating fine-scale structure in regions that are not well covered
by the subset. Motivated by the literature on compressive sensing, we propose an alternative
approach that involves linear projection of all the data points onto a lower-dimensional
subspace. We demonstrate the superiority of this approach from a theoretical perspective
and through simulated and real data examples.
Type
Journal articleSubject
Bayesian regressionCompressive sensing
Dimensionality reduction
Gaussian process
Random projection
Permalink
http://hdl.handle.net/10161/15591Published Version (Please cite this version)
10.1093/biomet/ass068Publication Info
Banerjee, A; Dunson, David B; & Tokdar, ST (2013). Efficient Gaussian process regression for large datasets. Biometrika, 100(1). pp. 75-89. 10.1093/biomet/ass068. Retrieved from http://hdl.handle.net/10161/15591.This is constructed from limited available data and may be imprecise. To cite this
article, please review & use the official citation provided by the journal.
Collections
More Info
Show full item recordScholars@Duke
David B. Dunson
Arts and Sciences Professor of Statistical Science
Development of novel approaches for representing and analyzing complex data. A particular
focus is on methods that incorporate geometric structure (both known and unknown)
and on probabilistic approaches to characterize uncertainty. In addition, a big interest
is in scalable algorithms and in developing approaches with provable guarantees.This
fundamental work is directly motivated by applications in biomedical research, network
data analysis, neuroscience, genomics, ecol

Articles written by Duke faculty are made available through the campus open access policy. For more information see: Duke Open Access Policy
Rights for Collection: Scholarly Articles