Gaussian Process-Based Models for Clinical Time Series in Healthcare

Thumbnail Image



Journal Title

Journal ISSN

Volume Title

Repository Usage Stats



Clinical prediction models offer the ability to help physicians make better data-driven decisions that can improve patient outcomes. Given the wealth of data available with the widespread adoption of electronic health records, more flexible statistical models are required that can account for the messiness and complexity of this data. In this dissertation we focus on developing models for clinical time series, as most data within healthcare is collected longitudinally and it is important to take this structure into account. Models built off of Gaussian processes are natural in this setting of irregularly sampled, noisy time series with many missing values. In addition, they have the added benefit of accounting for and quantifying uncertainty, which can be extremely useful in medical decision making. In this dissertation, we develop new Gaussian process-based models for medical time series along with associated algorithms for efficient inference on large-scale electronic health records data. We apply these models to several real healthcare applications, using local data obtained from the Duke University healthcare system.

In Chapter 1 we give a brief overview of clinical prediction models, electronic health records, and Gaussian processes. In Chapter 2, we develop several Gaussian process models for clinical time series in the context of chronic kidney disease management. We show how our proposed joint model for longitudinal and time-to-event data and model for multivariate time series can make accurate predictions about a patient's future disease trajectory. In Chapter 3, we combine multi-output Gaussian processes with a downstream black-box deep recurrent neural network model from deep learning. We apply this modeling framework to clinical time series to improve early detection of sepsis among patients in the hospital, and show that the Gaussian process preprocessing layer both allows for uncertainty quantification and acts as a form of data augmentation to reduce overfitting. In Chapter 4, we again use multi-output Gaussian processes as a preprocessing layer in model-free deep reinforcement learning. Here the goal is to learn optimal treatments for sepsis given clinical time series and historical treatment decisions taken by clinicians, and we show that the Gaussian process preprocessing layer and use of a recurrent architecture offers improvements over standard deep reinforcement learning methods. We conclude in Chapter 5 with a summary of future areas for work, and a discussion on practical considerations and challenges involved in deploying machine learning models into actual clinical practice.





Futoma, Joseph David (2018). Gaussian Process-Based Models for Clinical Time Series in Healthcare. Dissertation, Duke University. Retrieved from


Dukes student scholarship is made available to the public using a Creative Commons Attribution / Non-commercial / No derivative (CC-BY-NC-ND) license.