Multi-label annotation of text reports from computed tomography of the chest, abdomen, and pelvis using deep learning.

dc.contributor.author

D'Anniballe, Vincent M

dc.contributor.author

Tushar, Fakrul Islam

dc.contributor.author

Faryna, Khrystyna

dc.contributor.author

Han, Songyue

dc.contributor.author

Mazurowski, Maciej A

dc.contributor.author

Rubin, Geoffrey D

dc.contributor.author

Lo, Joseph Y

dc.date.accessioned

2023-08-14T22:44:32Z

dc.date.available

2023-08-14T22:44:32Z

dc.date.issued

2022-04

dc.date.updated

2023-08-14T22:44:28Z

dc.description.abstract

Background

There is progress to be made in building artificially intelligent systems to detect abnormalities that are not only accurate but can handle the true breadth of findings that radiologists encounter in body (chest, abdomen, and pelvis) computed tomography (CT). Currently, the major bottleneck for developing multi-disease classifiers is a lack of manually annotated data. The purpose of this work was to develop high throughput multi-label annotators for body CT reports that can be applied across a variety of abnormalities, organs, and disease states thereby mitigating the need for human annotation.

Methods

We used a dictionary approach to develop rule-based algorithms (RBA) for extraction of disease labels from radiology text reports. We targeted three organ systems (lungs/pleura, liver/gallbladder, kidneys/ureters) with four diseases per system based on their prevalence in our dataset. To expand the algorithms beyond pre-defined keywords, attention-guided recurrent neural networks (RNN) were trained using the RBA-extracted labels to classify reports as being positive for one or more diseases or normal for each organ system. Alternative effects on disease classification performance were evaluated using random initialization or pre-trained embedding as well as different sizes of training datasets. The RBA was tested on a subset of 2158 manually labeled reports and performance was reported as accuracy and F-score. The RNN was tested against a test set of 48,758 reports labeled by RBA and performance was reported as area under the receiver operating characteristic curve (AUC), with 95% CIs calculated using the DeLong method.

Results

Manual validation of the RBA confirmed 91-99% accuracy across the 15 different labels. Our models extracted disease labels from 261,229 radiology reports of 112,501 unique subjects. Pre-trained models outperformed random initialization across all diseases. As the training dataset size was reduced, performance was robust except for a few diseases with a relatively small number of cases. Pre-trained classification AUCs reached > 0.95 for all four disease outcomes and normality across all three organ systems.

Conclusions

Our label-extracting pipeline was able to encompass a variety of cases and diseases in body CT reports by generalizing beyond strict rules with exceptional accuracy. The method described can be easily adapted to enable automated labeling of hospital-scale medical data sets for training image-based disease classifiers.
dc.identifier

10.1186/s12911-022-01843-4

dc.identifier.issn

1472-6947

dc.identifier.issn

1472-6947

dc.identifier.uri

https://hdl.handle.net/10161/28724

dc.language

eng

dc.publisher

Springer Science and Business Media LLC

dc.relation.ispartof

BMC medical informatics and decision making

dc.relation.isversionof

10.1186/s12911-022-01843-4

dc.subject

Abdomen

dc.subject

Pelvis

dc.subject

Humans

dc.subject

Tomography, X-Ray Computed

dc.subject

Deep Learning

dc.subject

Neural Networks, Computer

dc.title

Multi-label annotation of text reports from computed tomography of the chest, abdomen, and pelvis using deep learning.

dc.type

Journal article

duke.contributor.orcid

Tushar, Fakrul Islam|0000-0001-7180-563X

duke.contributor.orcid

Rubin, Geoffrey D|0000-0002-3820-2500

duke.contributor.orcid

Lo, Joseph Y|0000-0002-9540-5072

pubs.begin-page

102

pubs.issue

1

pubs.organisational-group

Duke

pubs.organisational-group

Pratt School of Engineering

pubs.organisational-group

School of Medicine

pubs.organisational-group

Student

pubs.organisational-group

Clinical Science Departments

pubs.organisational-group

Institutes and Centers

pubs.organisational-group

Biomedical Engineering

pubs.organisational-group

Electrical and Computer Engineering

pubs.organisational-group

Radiology

pubs.organisational-group

Duke Cancer Institute

pubs.publication-status

Published

pubs.volume

22

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Multi-label annotation of text reports from computed tomography of the chest, abdomen, and pelvis using deep learning.pdf
Size:
4.03 MB
Format:
Adobe Portable Document Format
Description:
Published version