ALERT: This system is being upgraded on Tuesday December 12. It will not be available
for use for several hours that day while the upgrade is in progress. Deposits to DukeSpace
will be disabled on Monday December 11, so no new items are to be added to the repository
while the upgrade is in progress. Everything should be back to normal by the end of
day, December 12.
A cloud-compatible bioinformatics pipeline for ultrarapid pathogen identification from next-generation sequencing of clinical samples.
Abstract
Unbiased next-generation sequencing (NGS) approaches enable comprehensive pathogen
detection in the clinical microbiology laboratory and have numerous applications for
public health surveillance, outbreak investigation, and the diagnosis of infectious
diseases. However, practical deployment of the technology is hindered by the bioinformatics
challenge of analyzing results accurately and in a clinically relevant timeframe.
Here we describe SURPI ("sequence-based ultrarapid pathogen identification"), a computational
pipeline for pathogen identification from complex metagenomic NGS data generated from
clinical samples, and demonstrate use of the pipeline in the analysis of 237 clinical
samples comprising more than 1.1 billion sequences. Deployable on both cloud-based
and standalone servers, SURPI leverages two state-of-the-art aligners for accelerated
analyses, SNAP and RAPSearch, which are as accurate as existing bioinformatics tools
but orders of magnitude faster in performance. In fast mode, SURPI detects viruses
and bacteria by scanning data sets of 7-500 million reads in 11 min to 5 h, while
in comprehensive mode, all known microorganisms are identified, followed by de novo
assembly and protein homology searches for divergent viruses in 50 min to 16 h. SURPI
has also directly contributed to real-time microbial diagnosis in acutely ill patients,
underscoring its potential key role in the development of unbiased NGS-based clinical
assays in infectious diseases that demand rapid turnaround times.
Type
Journal articleSubject
Computational BiologyDatabases, Nucleic Acid
High-Throughput Nucleotide Sequencing
Humans
Metagenomics
ROC Curve
Reproducibility of Results
Software
Permalink
https://hdl.handle.net/10161/13772Published Version (Please cite this version)
10.1101/gr.171934.113Publication Info
Naccache, Samia N; Federman, Scot; Veeraraghavan, Narayanan; Zaharia, Matei; Lee,
Deanna; Samayoa, Erik; ... Chiu, Charles Y (2014). A cloud-compatible bioinformatics pipeline for ultrarapid pathogen identification
from next-generation sequencing of clinical samples. Genome Res, 24(7). pp. 1180-1192. 10.1101/gr.171934.113. Retrieved from https://hdl.handle.net/10161/13772.This is constructed from limited available data and may be imprecise. To cite this
article, please review & use the official citation provided by the journal.
Collections
More Info
Show full item recordScholars@Duke
John Andrew Crump
Adjunct Professor in the Department of Medicine
I am based in northern Tanzania where I am Site Leader for Duke University’s
collaborative research program based at Kilimanjaro Christian Medical Centre and Director
of Tanzania Operations for the Duke Global Health Institute. I oversee the design
and implementation of research studies on infectious diseases, particularly febrile
illness, invasive bacterial disease, HIV-associated opportunistic infections, clinical
trials of antiretroviral therapy and prevention of mother-to-child tr

Articles written by Duke faculty are made available through the campus open access policy. For more information see: Duke Open Access Policy
Rights for Collection: Scholarly Articles
Works are deposited here by their authors, and represent their research and opinions, not that of Duke University. Some materials and descriptions may include offensive content. More info