Systematic comparison of published host gene expression signatures for bacterial/viral discrimination.



Measuring host gene expression is a promising diagnostic strategy to discriminate bacterial and viral infections. Multiple signatures of varying size, complexity, and target populations have been described. However, there is little information to indicate how the performance of various published signatures compare to one another.


This systematic comparison of host gene expression signatures evaluated the performance of 28 signatures, validating them in 4589 subjects from 51 publicly available datasets. Thirteen COVID-specific datasets with 1416 subjects were included in a separate analysis. Individual signature performance was evaluated using the area under the receiving operating characteristic curve (AUC) value. Overall signature performance was evaluated using median AUCs and accuracies.


Signature performance varied widely, with median AUCs ranging from 0.55 to 0.96 for bacterial classification and 0.69-0.97 for viral classification. Signature size varied (1-398 genes), with smaller signatures generally performing more poorly (P < 0.04). Viral infection was easier to diagnose than bacterial infection (84% vs. 79% overall accuracy, respectively; P < .001). Host gene expression classifiers performed more poorly in some pediatric populations (3 months-1 year and 2-11 years) compared to the adult population for both bacterial infection (73% and 70% vs. 82%, respectively; P < .001) and viral infection (80% and 79% vs. 88%, respectively; P < .001). We did not observe classification differences based on illness severity as defined by ICU admission for bacterial or viral infections. The median AUC across all signatures for COVID-19 classification was 0.80 compared to 0.83 for viral classification in the same datasets.


In this systematic comparison of 28 host gene expression signatures, we observed differences based on a signature's size and characteristics of the validation population, including age and infection type. However, populations used for signature discovery did not impact performance, underscoring the redundancy among many of these signatures. Furthermore, differential performance in specific populations may only be observable through this type of large-scale validation.





Published Version (Please cite this version)


Publication Info

Bodkin, Nicholas, Melissa Ross, Micah T McClain, Emily R Ko, Christopher W Woods, Geoffrey S Ginsburg, Ricardo Henao, Ephraim L Tsalik, et al. (2022). Systematic comparison of published host gene expression signatures for bacterial/viral discrimination. Genome medicine, 14(1). p. 18. 10.1186/s13073-022-01025-x Retrieved from

This is constructed from limited available data and may be imprecise. To cite this article, please review & use the official citation provided by the journal.



Micah Thomas McClain

Associate Professor of Medicine

Emily Ray Ko

Assistant Professor of Medicine

Clinical and translational research, COVID-19 therapeutics, clinical biomarkers for infectious disease.


Christopher Wildrick Woods

Wolfgang Joklik Distinguished Professor of Global Health

1. Emerging Infections
2. Global Health
3. Epidemiology of infectious diseases
4. Clinical microbiology and diagnostics
5. Bioterrorism Preparedness
6. Surveillance for communicable diseases
7. Antimicrobial resistance

Unless otherwise indicated, scholarly articles published by Duke faculty members are made available here with a CC-BY-NC (Creative Commons Attribution Non-Commercial) license, as enabled by the Duke Open Access Policy. If you wish to use the materials in ways not already permitted under CC-BY-NC, please consult the copyright owner. Other materials are made available here through the author’s grant of a non-exclusive license to make their work openly accessible.