Learning for Control and Decision Making toward Medical Autonomy
Date
2024
Authors
Advisors
Journal Title
Journal ISSN
Volume Title
Repository Usage Stats
views
downloads
Abstract
Artificial intelligence (AI) and deep learning (DL) have recently shown success in domains related to healthcare and its decision-making systems. However, most of the existing methods are developed upon benchmark environments which are often defined with simplistic dynamics and allow access to data that are well-structured, pre-processed, and with substantial amount. It is intractable to leverage such methods to facilitate real-world applications; as limited access to the real-world healthcare environments leads to significantly reduced sample efficiency during training. Moreover, strict safety protocols are usually enforced in practice upon deployment to human participants, while the policy selection cri- teria weighs human feedback (HF) more than environmental returns; both of which could be intractable to be captured in simulations. From data logging perspectives, data irregularities are often encountered in healthcare facilities, e.g., missingness due to malfunctioned devices.
This dissertation aims to introduce AI/ML methods that can overcome limitations including insufficient and imperfect data as well as complying with safety protocols, which are applicable to real-world decision-making processes in healthcare systems, with focuses on (i) sample-efficient reinforcement learning (RL) based frameworks that can synthesize control policies of medical devices to maximize both environmental returns and HF in offline manners, with off-policy evaluation (OPE) facilitating the evaluation of RL policies without online interactions, i.e., for improved safety and efficiency upon deployment of the RL policies. (ii) DL-based analyses of multivariate healthcare data constituted by multiple modalities to facilitate clinical decision making systems, by tackling data irregularities and capturing underlying factors important to automated disease diagnoses and prognoses.
To tackle (i), we introduce an algorithmic OPE framework, variational latent branching model (VLBM), which can be integrated into most existing offline RL methods for effi- cient and safe policy evaluation and selection upon deployment. Specifically, it leverages variational inference to learn the transition function of MDPs by formulating the envi- ronmental dynamics as a compact latent space, from which the next states and rewards are then sampled. Its efficacy is validated by benchmark environments including Mujoco and Adroit. Then, an OPE for human feedback (OPEHF) method is developed on top of VLBM’s framework to capture the HF participants could have provided once the policies are deployed, further ensuring the satisfaction of human participants who received proce- dures or medical devices guided by RL agents. At last, we design a full-stack offline RL policy optimization pipeline, into which both OPE methods are integrated, toward training control policies of a implantable deep brain stimulation (DBS) device for treatment of Parkinson’s disease (PD), by adjusting the stimulation amplitude in real time. The goal is to reduce the energy used for generating the stimulus, while maintain the same level of treatment (i.e., control) efficacy as continuous DBS (cDBS) (i.e., constantly stimulating with the highest amplitude possbile, determined by clinicians). The efficacy is validated a cohort of 5 human participants, where results show that, and OPE components are able to pinpoint high-performing policies among the policy candidates trained using different offline RL algorithms or hyper-parameter sets.
In terms of (ii), we introduce three frameworks to address the following challenges, respectively. (ii.a) Data missingness, e.g., due to the non-periodical logging of patient vitals or lab results. (ii.b) High dimensionality and multi-modality within healthcare data, given that substantial amount of lab/vital results need to be recorded, and such information could come in the format of images (e.g., CT scan), tabular (e.g., demographic information) etc. The efficacy of each framework is validated by retrospective studies pertaining to the identification, prognoses and treatment of ophthalmic diseases. The results show that our frameworks are able to accurately identify the presence of diseases and automatically design treatment plans.
Type
Department
Description
Provenance
Subjects
Citation
Permalink
Citation
Gao, Qitong (2024). Learning for Control and Decision Making toward Medical Autonomy. Dissertation, Duke University. Retrieved from https://hdl.handle.net/10161/30887.
Collections
Except where otherwise noted, student scholarship that was shared on DukeSpace after 2009 is made available to the public under a Creative Commons Attribution / Non-commercial / No derivatives (CC-BY-NC-ND) license. All rights in student work shared on DukeSpace before 2009 remain with the author and/or their designee, whose permission may be required for reuse.