Systematic comparison of published host gene expression signatures for bacterial/viral discrimination.



Measuring host gene expression is a promising diagnostic strategy to discriminate bacterial and viral infections. Multiple signatures of varying size, complexity, and target populations have been described. However, there is little information to indicate how the performance of various published signatures compare to one another.


This systematic comparison of host gene expression signatures evaluated the performance of 28 signatures, validating them in 4589 subjects from 51 publicly available datasets. Thirteen COVID-specific datasets with 1416 subjects were included in a separate analysis. Individual signature performance was evaluated using the area under the receiving operating characteristic curve (AUC) value. Overall signature performance was evaluated using median AUCs and accuracies.


Signature performance varied widely, with median AUCs ranging from 0.55 to 0.96 for bacterial classification and 0.69-0.97 for viral classification. Signature size varied (1-398 genes), with smaller signatures generally performing more poorly (P < 0.04). Viral infection was easier to diagnose than bacterial infection (84% vs. 79% overall accuracy, respectively; P < .001). Host gene expression classifiers performed more poorly in some pediatric populations (3 months-1 year and 2-11 years) compared to the adult population for both bacterial infection (73% and 70% vs. 82%, respectively; P < .001) and viral infection (80% and 79% vs. 88%, respectively; P < .001). We did not observe classification differences based on illness severity as defined by ICU admission for bacterial or viral infections. The median AUC across all signatures for COVID-19 classification was 0.80 compared to 0.83 for viral classification in the same datasets.


In this systematic comparison of 28 host gene expression signatures, we observed differences based on a signature's size and characteristics of the validation population, including age and infection type. However, populations used for signature discovery did not impact performance, underscoring the redundancy among many of these signatures. Furthermore, differential performance in specific populations may only be observable through this type of large-scale validation.





Published Version (Please cite this version)


Publication Info

Bodkin, Nicholas, Melissa Ross, Micah T McClain, Emily R Ko, Christopher W Woods, Geoffrey S Ginsburg, Ricardo Henao, Ephraim L Tsalik, et al. (2022). Systematic comparison of published host gene expression signatures for bacterial/viral discrimination. Genome medicine, 14(1). p. 18. 10.1186/s13073-022-01025-x Retrieved from

This is constructed from limited available data and may be imprecise. To cite this article, please review & use the official citation provided by the journal.



Micah Thomas McClain

Associate Professor of Medicine

Emily Ray Ko

Assistant Professor of Medicine

Clinical and translational research, COVID-19 therapeutics, clinical biomarkers for infectious disease.


Christopher Wildrick Woods

Wolfgang Joklik Distinguished Professor of Global Health

1. Emerging Infections
2. Global Health
3. Epidemiology of infectious diseases
4. Clinical microbiology and diagnostics
5. Bioterrorism Preparedness
6. Surveillance for communicable diseases
7. Antimicrobial resistance


Ephraim Tsalik

Adjunct Associate Professor in the Department of Medicine

My research at Duke has focused on understanding the dynamic between host and pathogen so as to discover and develop host-response markers that can diagnose and predict health and disease.  This new and evolving approach to diagnosing illness has the potential to significantly impact individual as well as public health considering the rise of antibiotic resistance.

With any potential infectious disease diagnosis, it is difficult, if not impossible, to determine at the time of presentation what the underlying cause of illness is.  For example, acute respiratory illness is among the most frequent reasons for patients to seek care. These symptoms, such as cough, sore throat, and fever may be due to a bacterial infection, viral infection, both, or a non-infectious condition such as asthma or allergies.  Given the difficulties in making the diagnosis, most patients are inappropriately given antibacterials.  However, each of these etiologies (bacteria, virus, or something else entirely) leaves a fingerprint embedded in the host’s response. We are very interested in finding those fingerprints and exploiting them to generate new approaches to understand, diagnose, and manage disease.

These principles also apply to sepsis, defined as life-threatening organ dysfunction caused by a dysregulated host response to infection. Just as with acute respiratory illness, it is often difficult to identify whether infection is responsible for a patient’s critical illness.  We have embarked on a number of research programs that aim to better identify sepsis; define sepsis subtypes that can be used to guide future clinical research; and to better predict sepsis outcomes.  These efforts have focused on many systems biology modalities including transcriptomics, miRNA, metabolomics, and proteomics.  Consequently, our Data Science team has utilized these highly complex data to develop new statistical methods, furthering both the clinical and statistical research communities.

These examples are just a small sampling of the breadth of research Dr. Tsalik and his colleagues have conducted.  

In April 2022, Dr. Tsalik has joined Danaher Diagnostics as the VP and Chief Scientific Officer for Infectious Disease, where he is applying this experience in biomarkers and diagnostics to shape the future of diagnostics in ID. 

Unless otherwise indicated, scholarly articles published by Duke faculty members are made available here with a CC-BY-NC (Creative Commons Attribution Non-Commercial) license, as enabled by the Duke Open Access Policy. If you wish to use the materials in ways not already permitted under CC-BY-NC, please consult the copyright owner. Other materials are made available here through the author’s grant of a non-exclusive license to make their work openly accessible.