A cloud-compatible bioinformatics pipeline for ultrarapid pathogen identification from next-generation sequencing of clinical samples.

Abstract

Unbiased next-generation sequencing (NGS) approaches enable comprehensive pathogen detection in the clinical microbiology laboratory and have numerous applications for public health surveillance, outbreak investigation, and the diagnosis of infectious diseases. However, practical deployment of the technology is hindered by the bioinformatics challenge of analyzing results accurately and in a clinically relevant timeframe. Here we describe SURPI ("sequence-based ultrarapid pathogen identification"), a computational pipeline for pathogen identification from complex metagenomic NGS data generated from clinical samples, and demonstrate use of the pipeline in the analysis of 237 clinical samples comprising more than 1.1 billion sequences. Deployable on both cloud-based and standalone servers, SURPI leverages two state-of-the-art aligners for accelerated analyses, SNAP and RAPSearch, which are as accurate as existing bioinformatics tools but orders of magnitude faster in performance. In fast mode, SURPI detects viruses and bacteria by scanning data sets of 7-500 million reads in 11 min to 5 h, while in comprehensive mode, all known microorganisms are identified, followed by de novo assembly and protein homology searches for divergent viruses in 50 min to 16 h. SURPI has also directly contributed to real-time microbial diagnosis in acutely ill patients, underscoring its potential key role in the development of unbiased NGS-based clinical assays in infectious diseases that demand rapid turnaround times.

Department

Description

Provenance

Citation

Published Version (Please cite this version)

10.1101/gr.171934.113

Publication Info

Naccache, Samia N, Scot Federman, Narayanan Veeraraghavan, Matei Zaharia, Deanna Lee, Erik Samayoa, Jerome Bouquet, Alexander L Greninger, et al. (2014). A cloud-compatible bioinformatics pipeline for ultrarapid pathogen identification from next-generation sequencing of clinical samples. Genome Res, 24(7). pp. 1180–1192. 10.1101/gr.171934.113 Retrieved from https://hdl.handle.net/10161/13772.

This is constructed from limited available data and may be imprecise. To cite this article, please review & use the official citation provided by the journal.

Scholars@Duke

Crump

John Andrew Crump

Adjunct Professor in the Department of Medicine

I am an Adjunct Professor of Medicine, Pathology, and Global Health. My work with Duke University is primarily based in northern Tanzania where I am former Site Leader and current Principal Investigator on projects linked to Duke University’s collaborative research program at Kilimanjaro Christian Medical Centre. I oversee the design and implementation of research studies on infectious diseases, particularly febrile illness, invasive bacterial disease, zoonotic infections, and infectious diseases diagnostics. In addition, I am Professor of Medicine, Pathology, and Global Health at the University of Otago and a medical epidemiologist with the US Centers for Disease Control and Prevention (CDC). My CDC work focuses on non-malaria febrile illness.


Unless otherwise indicated, scholarly articles published by Duke faculty members are made available here with a CC-BY-NC (Creative Commons Attribution Non-Commercial) license, as enabled by the Duke Open Access Policy. If you wish to use the materials in ways not already permitted under CC-BY-NC, please consult the copyright owner. Other materials are made available here through the author’s grant of a non-exclusive license to make their work openly accessible.