Browsing by Subject "Dimensionality reduction"
Results Per Page
Sort Options
Item Open Access Efficient Gaussian process regression for large datasets.(Biometrika, 2013-03) Banerjee, Anjishnu; Dunson, David B; Tokdar, Surya TGaussian processes are widely used in nonparametric regression, classification and spatiotemporal modelling, facilitated in part by a rich literature on their theoretical properties. However, one of their practical limitations is expensive computation, typically on the order of n(3) where n is the number of data points, in performing the necessary matrix inversions. For large datasets, storage and processing also lead to computational bottlenecks, and numerical stability of the estimates and predicted values degrades with increasing n. Various methods have been proposed to address these problems, including predictive processes in spatial data analysis and the subset-of-regressors technique in machine learning. The idea underlying these approaches is to use a subset of the data, but this raises questions concerning sensitivity to the choice of subset and limitations in estimating fine-scale structure in regions that are not well covered by the subset. Motivated by the literature on compressive sensing, we propose an alternative approach that involves linear projection of all the data points onto a lower-dimensional subspace. We demonstrate the superiority of this approach from a theoretical perspective and through simulated and real data examples.Item Open Access Exploration and Application of Dimensionality Reduction and Clustering Techniques to Diabetes Patient Health Records(2017-05-24) Gopinath, SidharthThis research examines various data dimensionality reduction techniques and clustering methods. The goal was to apply these ideas to a test dataset and a healthcare dataset to see how they practically work and what conclusions we could draw from their application. Specifically, we hoped to identify similar clusters of diabetes patients and develop hypotheses of risk for adverse events for further research into sub-populations of diabetes patients. Upon further research and application, it became apparent that the data dimensionality reduction and clustering methods are sensitive to the parameter settings and must be fine-tuned carefully to be successful. Additionally, we saw several statistically significant differences in outcomes for the clusters identified with these data. We focused on coronary artery disease and kidney disease. Focusing on these clusters, we found a high proportion of patients taking medications for heart or kidney conditions Based on these findings, we were able to decide on future paths building upon this research that could lead to more actionable conclusions.Item Open Access Relating Traits to Electrophysiology using Factor Models(2020) Talbot, Austin BTargeted stimulation of the brain has the potential to treat mental illnesses. The objective of this work is to develop methodology that enables scientists to design stimulation methods based on the electrophysiological dynamics. We first develop several factor models that characterize aspects of the dynamics relevant to these illnesses. Using a novel approach, we can then find a single predictive factor of the trait of interest. To improve the quality of the associated loadings, we develop a method for removing concomitant variables that can dominate the observed dynamics. We also develop a novel inference technique that increases the relevance of the predictive loadings. Finally, we demonstrate the efficacy of our methodology by finding a single factor responsible for social behavior. This factor is stimulated in new subjects and modifies behavior in the new individuals. These results indicate that our methodology has high potential in developing future cures of mental illness.
Item Open Access Tailored Scalable Dimensionality Reduction(2018) van den Boom, WillemAlthough there is a rich literature on scalable methods for dimensionality reduction, the focus has been on widely applicable approaches which, in certain applications, are far from optimal or not even applicable. Dimensionality reduction can improve scalability of Bayesian computation, but optimal performance needs tailoring to the model. What kind of dimensionality reduction is sensible in data applications varies by the context of the data, resulting in neglect of information contained in data that do not fit general approaches.
This dissertation introduces dimensionality reduction methods tailored to specific computational or data applications. Firstly, we scale up posterior computation in Bayesian linear regression using a dimensionality reduction approach enabled by the linearity in the model. It approximately integrates out nuisance parameters from a high-dimensional likelihood. The resulting posterior approximation scheme is competitive with state-of-the-art scalable posterior inference methods while being easier to interpret, understand, and analyze due to the explicit use of dimensionality reduction. Bayesian variable selection is considered as an example of a challenging posterior where the dimensionality reduction speeds up computation greatly and accurately.
Secondly, we show how to reduce dimensionality based on data context in varying-domain functional data, where existing methods do not apply. The data of interest are intraoperative blood pressure and heart rate measurements. The first proposed approach extracts multiple different low-dimensional features from the high-dimensional blood pressure data, which are partly predefined and partly learnt from the data. This yields insights regarding blood pressure variability new to the clinical literature since such detailed inference was not possible with existing methods. The concluding case of dimensionality reduction is quantifying coupling of blood pressure and heart rate. This reduces two time series to one measurement of the strength of coupling. The results show the utility for inference methods of dimensionality reduction that is tailored to the challenge at hand.