From Bench to Bedside: Advancing the Evaluation and Alignment of Machine Learning and Large Language Models for Clinical Impact in Healthcare
Date
2025
Authors
Advisors
Journal Title
Journal ISSN
Volume Title
Repository Usage Stats
views
downloads
Attention Stats
Abstract
Artificial Intelligence (AI) and Machine Learning (ML) offer significant promise fortransforming healthcare by improving patient outcomes and operational efficiency. How- ever, a substantial gap persists between the development of predictive ML models and their successful integration into real-world clinical practice. This dissertation studies the practical challenges in deploying AI/ML systems within healthcare environments, focus- ing specifically on evaluation methodologies that better align with real-world clinical and operational contexts. Through four distinct case studies, this work investigates longitu- dinal monitoring of deployed predictive models for advance care planning, proposes novel evaluation frameworks tailored for clinical time series prediction, and explores adaptive methodologies leveraging Large Language Models (LLMs) to address dynamic workflows in medical billing and patient communication tasks. Collectively, this research empha- sizes the importance of ongoing model evaluation, fairness monitoring, and adaptability to changing clinical requirements, ultimately aiming to bridge the gap between machine learning research and real-world healthcare implementation. First, we begin by describing a large-scale longitudinal deployment of a machine learning-based alert system designed to prompt advance care planning discussions, focusing on the continuous monitoring of pre- dictive performance, fairness drift, and clinical impact evaluation. Second, we address the gap between classical predictive evaluation metrics and clinical decision-making needs in clinical time series prediction, proposing a novel evaluation framework that aligns metrics with clinical utility. Third, we develop an adaptive large language model (LLM)-based framework to automate and dynamically update the patient portal message routing work- flows. Finally, we explore LLMs for the medical coding task, demonstrating their flexibility in the presence of changing coding standards and clinical guidelines. Together, these case studies highlight the necessity of robust, clinically aligned evaluation methodologies and adaptive modeling approaches to effectively integrate AI/ML innovations into healthcare practice.
Type
Description
Provenance
Subjects
Citation
Permalink
Citation
Gao, Michael Yimeng (2025). From Bench to Bedside: Advancing the Evaluation and Alignment of Machine Learning and Large Language Models for Clinical Impact in Healthcare. Dissertation, Duke University. Retrieved from https://hdl.handle.net/10161/33358.
Collections
Except where otherwise noted, student scholarship that was shared on DukeSpace after 2009 is made available to the public under a Creative Commons Attribution / Non-commercial / No derivatives (CC-BY-NC-ND) license. All rights in student work shared on DukeSpace before 2009 remain with the author and/or their designee, whose permission may be required for reuse.
