Demographic distribution matching between real-world and virtual phantom population.

dc.contributor.author

Ghosh, Dhrubajyoti

dc.contributor.author

Tushar, Fakrul

dc.contributor.author

Dahal, Lavsen

dc.contributor.author

Vancoillie, Liesbeth

dc.contributor.author

Lafata, Kyle J

dc.contributor.author

Samei, Ehsan

dc.contributor.author

Lo, Joseph Y

dc.contributor.author

Luo, Sheng

dc.date.accessioned

2026-04-01T19:19:42Z

dc.date.available

2026-04-01T19:19:42Z

dc.date.issued

2026-03

dc.description.abstract

Background

The adoption of virtual imaging trials (VITs) is rapidly expanding, offering a cost-effective and ethically viable alternative to large-scale clinical trials for imaging system evaluation. However, differences in demographic composition between virtual phantom populations and real-world clinical cohorts can introduce bias in imaging performance assessments, particularly for underrepresented populations. Such discrepancies, if unaddressed, can limit the translational relevance of VIT findings by misrepresenting diagnostic performance across diverse patient groups.

Purpose

To address this limitation, we introduce DISTINCT (Distributional Subsampling for Covariate-Targeted Alignment), a statistical framework for selecting demographically aligned subsamples from large clinical datasets to support robust comparisons with virtual cohorts.

Methods

We applied DISTINCT to the National Lung Screening Trial (NLST) and a companion virtual trial dataset (VLST). The algorithm jointly aligned typical continuous (age, BMI) and categorical (sex, race, ethnicity) variables by constructing multidimensional bins based on discretized covariates. For a given target size, DISTINCT samples individuals to match the joint demographic distribution of the reference population. We evaluated the demographic similarity between VLST and progressively larger NLST subsamples using Wasserstein and Kolmogorov-Smirnov (K-S) distances to identify the maximal subsample size with acceptable alignment. After demographic alignment, we evaluated lung cancer risk prediction performance by applying two established NLST risk scores to the aligned subsamples and assessing their stability with receiver operating characteristic (ROC) analysis.

Results

The DISTINCT algorithm identified a maximal demographically aligned NLST subsample of 9974 participants that preserved similarity to the VLST population. To assess whether such aligned subsets were sufficient for downstream applications, we applied two established NLST lung cancer risk scores and evaluated their performance using ROC analysis. Area under the curve (AUC) estimates stabilized once subsample sizes exceeded approximately 6000 participants, demonstrating that moderately sized aligned subsets provide reliable predictive model evaluation. Stratified analyses revealed demographic-specific variations in AUC, underscoring the importance of covariate alignment for fair and representative comparisons.

Conclusion

DISTINCT provides a statistically rigorous and scalable approach for covariate alignment between real and virtual imaging cohorts based on demographic factors of variability. Although demonstrated for lung cancer screening with low-dose CT, the framework is broadly applicable to other imaging modalities and diseases, and across wide ranges of factors of variability. By enabling fair and representative performance assessments, DISTINCT advances the integration of VITs into imaging research and protocol optimization workflows.
dc.identifier.issn

0094-2405

dc.identifier.issn

2473-4209

dc.identifier.uri

https://hdl.handle.net/10161/34351

dc.language

eng

dc.publisher

Wiley

dc.relation.ispartof

Medical physics

dc.relation.isversionof

10.1002/mp.70364

dc.rights.uri

https://creativecommons.org/licenses/by-nc/4.0

dc.subject

Humans

dc.subject

Lung Neoplasms

dc.subject

Phantoms, Imaging

dc.subject

Demography

dc.subject

Aged

dc.subject

Middle Aged

dc.subject

Female

dc.subject

Male

dc.title

Demographic distribution matching between real-world and virtual phantom population.

dc.type

Journal article

duke.contributor.orcid

Lo, Joseph Y|0000-0002-9540-5072

duke.contributor.orcid

Luo, Sheng|0000-0003-4214-5809

pubs.begin-page

e70364

pubs.issue

3

pubs.organisational-group

Duke

pubs.organisational-group

Pratt School of Engineering

pubs.organisational-group

School of Medicine

pubs.organisational-group

Trinity College of Arts & Sciences

pubs.organisational-group

Basic Science Departments

pubs.organisational-group

Clinical Science Departments

pubs.organisational-group

Institutes and Centers

pubs.organisational-group

Biostatistics & Bioinformatics

pubs.organisational-group

Biomedical Engineering

pubs.organisational-group

Pierre R. Lamond Department of Electrical and Computer Engineering

pubs.organisational-group

Pathology

pubs.organisational-group

Radiation Oncology

pubs.organisational-group

Radiology

pubs.organisational-group

Duke Cancer Institute

pubs.organisational-group

Mathematics

pubs.organisational-group

Biostatistics & Bioinformatics, Division of Biostatistics

pubs.publication-status

Published

pubs.volume

53

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
2026Ghosh_et_al2026Medical-Physics.pdf
Size:
704.63 KB
Format:
Adobe Portable Document Format
Description:
Published version