Statistical Analysis and Deep Learning Models for Immunoprofiling Based on Single-cell Data in Clinical Trials
Date
2025
Authors
Advisors
Journal Title
Journal ISSN
Volume Title
Repository Usage Stats
views
downloads
Attention Stats
Abstract
Recent developments in single-cell technologies have revolutionized immunoprofiling, allowing it to go to the next level in terms of characterizing immune heterogeneity, plasticity, and dynamics. However, these findings are restrictive from a translational perspective due to methodological limitations, including unpaired multi-omics integration, batch effects, and the inability to derive accurate clinical prediction in small-cohort studies. This thesis describes three methodological innovations that serve to collectively confront these challenges and facilitate data-based discovery in a clinical immunological setting.
First, we proposed a Variational Autoencoder (VAE) with optimal transport (OT) as a framework to integrate and translate between unpaired single-cell multi-omics (e.g., scRNA-seq and scATAC-seq) datasets. Through integrating the minimized aggregated Wasserstein (MAW) distance to align the latent space, this model learns the embeddings to be modality-invariant yet maintaining the biological and patient-level variability. It allows for cross-cell comparison across platforms and cross-modality translating even when direct cell-to-cell pairing is impossible.
Second, we introduced cytoDE, a fully automated pipeline for performing an analysis of high-dimensional cytometry clinical trial data. Instead of using traditional frequency-based measures, cytoDE estimates inter-patient divergence as changes in the distribution of expression of each marker, via Gaussian Mixture Models and optimal transpose-based distances. Applied to longitudinal single-cell cytometry data from a neoadjuvant immunotherapy trial in early-stage non-small cell lung cancer (NSCLC), cytoDE identified inferential and predictive biomarkers that are not observable by traditional methodologies.
Third, we introduced cytoGPNet, a novel deep learning model specifically designed for prediction of outcomes from longitudinal single-cell cytometry data. This model combines denoising autoencoder with Gaussian process (GP) module, together with attention-based temporal summarization and outputs reliable, interpretative and temporally informed clinical predictions. cytoGPNet is found to outperform state-of-the-art methods in a number of single-cell datasets (e.g. NSCLC, HIV, influenza, COVID-19), and its robustness to batch effects, missing data, and differing cell numbers is shown.
These constructions provide multi-functional analytical pipelines, that combine generative modelling, statistical inference and deep learning, to derive clinically actionable knowledge from complex single-cell data. Methodological advancements are conceived along three essential dimensions: cross-modality integration, statistically significant biomarker discovery, and interpretable predictive modeling, providing a foundation for scalable, principled and impactful single-cell analysis in precision immunology and translational medicine.
Type
Description
Provenance
Subjects
Citation
Permalink
Citation
Zhang, Jingxuan (2025). Statistical Analysis and Deep Learning Models for Immunoprofiling Based on Single-cell Data in Clinical Trials. Dissertation, Duke University. Retrieved from https://hdl.handle.net/10161/33370.
Collections
Except where otherwise noted, student scholarship that was shared on DukeSpace after 2009 is made available to the public under a Creative Commons Attribution / Non-commercial / No derivatives (CC-BY-NC-ND) license. All rights in student work shared on DukeSpace before 2009 remain with the author and/or their designee, whose permission may be required for reuse.
