Multi-label annotation of text reports from computed tomography of the chest, abdomen, and pelvis using deep learning.



There is progress to be made in building artificially intelligent systems to detect abnormalities that are not only accurate but can handle the true breadth of findings that radiologists encounter in body (chest, abdomen, and pelvis) computed tomography (CT). Currently, the major bottleneck for developing multi-disease classifiers is a lack of manually annotated data. The purpose of this work was to develop high throughput multi-label annotators for body CT reports that can be applied across a variety of abnormalities, organs, and disease states thereby mitigating the need for human annotation.


We used a dictionary approach to develop rule-based algorithms (RBA) for extraction of disease labels from radiology text reports. We targeted three organ systems (lungs/pleura, liver/gallbladder, kidneys/ureters) with four diseases per system based on their prevalence in our dataset. To expand the algorithms beyond pre-defined keywords, attention-guided recurrent neural networks (RNN) were trained using the RBA-extracted labels to classify reports as being positive for one or more diseases or normal for each organ system. Alternative effects on disease classification performance were evaluated using random initialization or pre-trained embedding as well as different sizes of training datasets. The RBA was tested on a subset of 2158 manually labeled reports and performance was reported as accuracy and F-score. The RNN was tested against a test set of 48,758 reports labeled by RBA and performance was reported as area under the receiver operating characteristic curve (AUC), with 95% CIs calculated using the DeLong method.


Manual validation of the RBA confirmed 91-99% accuracy across the 15 different labels. Our models extracted disease labels from 261,229 radiology reports of 112,501 unique subjects. Pre-trained models outperformed random initialization across all diseases. As the training dataset size was reduced, performance was robust except for a few diseases with a relatively small number of cases. Pre-trained classification AUCs reached > 0.95 for all four disease outcomes and normality across all three organ systems.


Our label-extracting pipeline was able to encompass a variety of cases and diseases in body CT reports by generalizing beyond strict rules with exceptional accuracy. The method described can be easily adapted to enable automated labeling of hospital-scale medical data sets for training image-based disease classifiers.





Published Version (Please cite this version)


Publication Info

D'Anniballe, Vincent M, Fakrul Islam Tushar, Khrystyna Faryna, Songyue Han, Maciej A Mazurowski, Geoffrey D Rubin and Joseph Y Lo (2022). Multi-label annotation of text reports from computed tomography of the chest, abdomen, and pelvis using deep learning. BMC medical informatics and decision making, 22(1). p. 102. 10.1186/s12911-022-01843-4 Retrieved from

This is constructed from limited available data and may be imprecise. To cite this article, please review & use the official citation provided by the journal.


Fakrul Islam Tushar


Fakrul Islam Tushar is a second-year Ph.D. student in the Department of Electrical and Computer Engineering at Duke University. He is also a Research Associate at the Center for Virtual Imaging Trials primarily engaged in research, computer-aided diagnosis, and healthcare innovation using machine learning and image analysis-driven solutions. He graduated from the Erasmus+ Joint Master’s in Medical Imaging and Applications (Spain, Italy, and France) and served as a Post-Graduate Research Associate at the Carl E. Ravin Advanced Imaging Laboratories (RAI Labs) at Duke University Medical Center. He earned a Bachelor of Science degree in Electrical and Electronics Engineering at the American International University Bangladesh (AIUB).


Maciej A Mazurowski

Associate Professor in Radiology

Geoffrey D Rubin

Adjunct Professor in the Department of Radiology

Geoffrey D. Rubin, MD, MBA, FACR, FAHA, FSABI, FNASCI is the George B. Geller Distinguished Professor for Research in Cardiovascular Diseases and Professor of Radiology at Duke University.

Born in Los Angeles, California, he earned Bachelor of Science degrees with Honor in Chemistry and Biology from the California Institute of Technology in 1982 and MD degree from the University of California, San Diego in 1987. He spent the next 22 years at Stanford University where he completed Diagnostic Radiology residency in 1992, a Body Imaging fellowship in 1993, and after joining the faculty in 1993 earned the rank of full Professor with university tenure in 2005.

Beginning in 1991, Dr. Rubin and colleagues at Stanford pioneered the development and use of spiral and multidetector-row CT for imaging the cardiovascular system. He is the lead author on the earliest scientific reports of CT angiography applied to a breadth of thoracic, abdominal, and peripheral vascular applications. Dr. Rubin founded and became Chief of Cardiovascular Imaging in 2000 where he fostered collaborative cross-disciplinary teams that facilitated a transformation from invasive to non-invasive CT-based diagnosis and treatment planning for many vascular disorders, notably the application of stent-grafts for aortic aneurysms.

Dr. Rubin co-founded the Stanford 3-D Medical Imaging Laboratory in 1996 and served as its Medical Director until 2010. His teams published the first descriptions of novel volumetric image presentations including perspective volume rendering as a basis for virtual endoscopy and curved planar reformations for blood vessel tracking and quantitation. The Stanford 3-D Laboratory established the first scalable clinical service facility for applying computer graphics and vision tools to medical imaging data, training hundreds of physicians and technologists to emulate the model worldwide.

Between 2005-2010, Dr. Rubin served as Associate Dean for Clinical Affairs in the School of Medicine, Associate Director of the Stanford Cardiovascular Institute from 2007-2010, and became the first Chief of Staff-Elect by the medical staff of Stanford Hospital & Clinics and serving as Vice Chief of Staff from 2007-2010. Through these leadership roles, he contributed to the establishment of Stanford’s first comprehensive electronic health record system, the management of a newly centralized School of Medicine faculty funds flow, transition to a self-governing medical staff organization, cross-departmental harmonization of clinical privileges, and policies formalizing the introduction of clinical innovations and management of conflicts of interest.

In 2010, Dr. Rubin assumed the role of Chairman of the Department of Radiology at Duke University where during his tenure he led the development and expansion of imaging services into new clinical and research facilities, increased diversity amongst departmental faculty and leadership, established a basis for an enterprise-wide imaging operation and infrastructure, and implemented an award-winning cross-disciplinary revenue integrity program.

His current work focuses on applications of artificial intelligence toward assisted interpretation of volumetric medical imaging, the contributions of perceptual variations to radiologist performance in volumetric image interpretation, and effective leadership and management in radiology and healthcare. In support of the latter focus, he is an avid mentor for radiology leaders and has developed and taught several national leadership training programs for the Radiology Leadership Institute of the American College of Radiology, where he has served as a founding Board Member since 2012.  Dr. Rubin is also President and Board Chair of the International Society for Computed Tomography, Board Member of RAD-AID International, and is co-chair of the RSNA-ACR Public Information Website Committee overseeing

Dr. Rubin is Past President of the North American Society for Cardiovascular Imagers, the Society for Computed Body Tomography and Magnetic Resonance, and the Fleischner Society for Thoracic Imaging and Diagnosis. He is the author of over 200 peer-reviewed manuscripts and over 50 review articles and book chapters. He has edited five books, including the highly acclaimed textbook, CT and MR Angiography: Comprehensive Vascular Assessment. He holds six U.S. patents on medical image analysis and has served as Principal Investigator of three NIH RO1s focused on imaging and analysis of cardiovascular and pulmonary diseases, “Measurement of the Aorta and its Branches” (1998-2003), “Efficient Interpretation of 3D Vascular Image Data” (2001-2007) and “Improving Radiologist Detection of Lung Nodules with CAD” (2004-2011). In 2008, he was awarded the “Most Effective Radiology Educator” by He is an active public speaker, having made over 1000 presentations to medical, scientific, and lay audiences in over 40 countries.

In 1997, he co-founded Trivascular Inc, remaining actively engaged in support of its development of low-profile aortic stent-grafts until the company was bought in 2004. In 2011, he co-founded Informatics in Context, providing real-time automated adjudication of prior authorization requests via EDI 278. Over the last 21 years, Dr. Rubin has served as a consultant to numerous start-ups seeking to bring important innovations to the marketplace.

In 2014 Dr. Rubin received an MBA from the Fuqua School of Business at Duke University, was named a Fuqua Scholar, and elected commencement speaker. He remains actively engaged with the Fuqua Health Sector Advisory Council and as a mentor to MBA students and recent graduates.


Joseph Yuan-Chieh Lo

Professor in Radiology

My research is at the intersection of computer vision, machine learning, and medical imaging, with a dual focus on mammography and computed tomography (CT). Together with our industry partner, we developed deep learning algorithms for breast cancer screening with 2D/3D mammography, and that product is now undergoing FDA approval with anticipated rollout to clinics worldwide. We also pioneer the creation of "digital twin" anatomical models from patient imaging data, using these models to forge new paths in CT scan analysis through virtual readers and deep learning techniques. Additionally, we're developing a computer-aided triage system for detecting diseases across multiple organs in body CT scans, leveraging hospital-scale datasets and integrating natural language processing with deep learning for comprehensive disease classification.

Unless otherwise indicated, scholarly articles published by Duke faculty members are made available here with a CC-BY-NC (Creative Commons Attribution Non-Commercial) license, as enabled by the Duke Open Access Policy. If you wish to use the materials in ways not already permitted under CC-BY-NC, please consult the copyright owner. Other materials are made available here through the author’s grant of a non-exclusive license to make their work openly accessible.