Developing and Validating Digital Biomarkers of Glycemic Health through the Creation of a Benchmark Dataset

Limited Access
This item is unavailable until:
2028-02-03

Date

2025

Journal Title

Journal ISSN

Volume Title

Repository Usage Stats

12
views
0
downloads

Attention Stats

Abstract

Type 2 diabetes (T2D) and its precursor states, including impaired fasting glucose (IFG) and impaired glucose tolerance (IGT), develop gradually over years. During this early phase, individuals may experience intermittent dysglycemia, altered autonomic regulation, and subtle cardiometabolic stress, often without meeting clinical thresholds for diagnosis. Current frontline screening tools -- fasting glucose, hemoglobin A1c (HbA1c), and, less commonly, the oral glucose tolerance test (OGTT) - are limited in their ability to capture this early biology. HbA1c and spot glucose tests provide static, low-resolution snapshots of a highly dynamic system, and they can fail in both directions: they can classify individuals with ongoing physiologic dysregulation as “normal,” and they can label others as high risk even when physiology is improving. OGTT, which probes post-challenge glucose handling over time and can detect IGT before overt T2D, is more sensitive but remains underused in practice because it is time-intensive, clinic-bound, and logistically burdensome. As a result, many people at greatest risk for progression to T2D are either not identified or not stratified in a way that supports targeted intervention.This thesis addresses that gap by developing and validating digital biomarkers of dysglycemia derived from passively and semi-passively collected physiological data in free-living settings from individuals with normoglycemia, prediabetes (PD), and type 2 diabetes (T2D) in the DiabetesWatch study. We designed and execute a novel fully remote, sensor-rich digital health study to create and characterize a benchmark dataset that captures both dynamic glucose responses and continuous cardiometabolic physiology outside the clinic. We then leverage two complementary data streams collected in DiabetesWatch: (1) CGM-based meal challenge data enabling an at-home mobile OGTT (mOGTT), and (2) continuous multimodal physiological signals measured from wrist-worn wearable devices, including heart rate, heart rate variability (HRV), peripheral oxygen saturation (SpO₂), electrodermal activity, skin temperature, sleep, and activity to model autonomic, respiratory, and metabolic stress physiology. Across these datasets, we design computational methods to extract physiologically interpretable features, quantify individual metabolic responses, and map those responses to established clinical constructs. We present the DiabetesWatch dataset as a resource for discovering digital biomarkers of glycemic health. The cohort is deeply phenotyped with 14 days of free-living continuous glucose monitoring, commercial and research-grade wearable data, detailed food logs, HbA1c, and an oral glucose tolerance test. To our knowledge, this is the largest dataset of its kind spanning normoglycemia, prediabetes, and unmedicated type 2 diabetes with this level of multimodal categorization. Using DiabetesWatch, we demonstrate the feasibility of fully remote digital health studies for glycemic health that achieve high participant satisfaction, demographic diversity, broad geographic reach, and strong interest in follow-up, while capturing rich, heterogeneous sensor data - including relatively underexplored modalities such as SpO₂ and electrodermal activity. The study design and data collection pipeline provide a replicable template for future glycemic health research. We also introduce the mobile oral glucose tolerance test (mOGTT), a CGM-enabled alternative to the traditional OGTT that can be performed outside the clinic. In mOGTT, participants ingest a standardized glucose challenge while wearing a CGM, allowing us to reconstruct classic OGTT readouts - fasting glucose, post-challenge glucose, total glycemic exposure over two hours (AUC), and recovery back toward baseline - without venipuncture and without requiring repeated in-person sampling. Using data collected from adults spanning normoglycemia, PD, and T2D, we show that mOGTT metrics reproduce clinically meaningful patterns observed in traditional OGTT. These metrics align with diagnostic thresholds for IFG and IGT and capture a wide range of physiologic phenotypes, including elevated post-load excursions, delayed recovery, large total glycemic burden, and persistent fasting elevation. Importantly, we observe that individuals with “normal” HbA1c can still exhibit multi-metric dysglycemia on mOGTT, while some individuals with HbA1c in the T2D and PD range show physiologic improvement (e.g., rapid recovery, low excursions), suggesting that HbA1c alone can both under-call emerging risk and over-call people who may be reversing disease through lifestyle change. We then formalize these glucose dynamics into a composite glucose challenge response index (GCRI) score. This score is constructed by quantifying, for each participant, how far their fasting, peak, post-challenge, and cumulative exposure metrics deviate from clinically used OGTT thresholds, and then aggregating those deviations into a single continuous index of dysglycemia. Methodologically, this treats glycemic regulation as a multidimensional control problem rather than a single cutoff problem. We show that the GCRI increases monotonically with clinical risk and generalizes to an independent external cohort. Further, we evaluate whether chronic, passively acquired physiology from wearables can provide a scalable, fully noninvasive screen for dysglycemia. From continuous wrist-worn data streams, we derive autonomic and cardiometabolic features including resting heart rate, HRV, oxygen saturation, thermoregulatory fluctuation, and sleep-linked recovery metrics. We train and evaluate multiple predictive model families - regularized logistic regression, random forest, gradient boosting, and XGBoost, along with simple ensembles - using an analysis pipeline with strict separation between model selection and testing. These models distinguish PD/T2D from normoglycemia with moderate-to-high discrimination (area under the ROC curve up to 0.90 and area under the precision–recall curve up to ~0.96 in held-out evaluation), driven by physiologic signatures consistent with known early cardio–metabolic stress: elevated resting rate, autonomic imbalance, oxygenation instability, and blunted nighttime recovery. The features that repeatedly emerge as most informative are physiologically coherent and reproducible across model classes, suggesting that subclinical metabolic dysfunction is already imprinted in cardiovascular and autonomic control before overt diabetes. Taken together, this thesis demonstrates an end-to-end framework for digital glycemic phenotyping: from at-home, CGM-based dynamic stress testing (mOGTT), to composite dysglycemia scoring tied to clinical thresholds, to passive cardiometabolic risk screening from wearables. The central conclusion is that continuous and near-continuous biosensing can surface early metabolic dysfunction in a way that is richer, more personalized, and more scalable than traditional single-point clinical tests. By translating raw signals into quantitative digital biomarkers, this work lays the groundwork for proactive, physiology-aware monitoring of PD and emerging T2D. It supports a shift from episodic detection to continuous risk assessment, and from binary diagnosis to mechanism-informed stratification, with the longer-term goal of enabling earlier intervention, individualized prevention, and remote metabolic care at population scale.

Description

Provenance

Subjects

Biomedical engineering, Artificial Intelligence, Digital Biomarkers, Digital Health Technologies, Machine Learning, Metabolic Health, Wearables

Citation

Citation

Singh, Karnika (2025). Developing and Validating Digital Biomarkers of Glycemic Health through the Creation of a Benchmark Dataset. Dissertation, Duke University. Retrieved from https://hdl.handle.net/10161/34140.

Collections


Except where otherwise noted, student scholarship that was shared on DukeSpace after 2009 is made available to the public under a Creative Commons Attribution / Non-commercial / No derivatives (CC-BY-NC-ND) license. All rights in student work shared on DukeSpace before 2009 remain with the author and/or their designee, whose permission may be required for reuse.