Towards Fully Automated Interpretation of Volumetric Medical Images with Deep Learning

Thumbnail Image



Journal Title

Journal ISSN

Volume Title

Repository Usage Stats



Computed tomography (CT) is a medical imaging technique used for the diagnosis and management of numerous conditions, including cancer, fractures, and infections. Automated interpretation of CT scans using deep learning holds immense promise, as it may accelerate the radiology workflow, bring radiology expertise to underserved areas, and reduce missed diagnoses caused by human error. However, several obstacles have thus far prevented deployment of fully automated CT interpretation systems: (1) the difficulty of acquiring and preparing CT volumes; (2) the arduousness of manually acquiring structured abnormality labels needed to train models; (3) the question of how to construct high-performing models for CT interpretation; and (4) the need for explainable models. In this thesis, I address all four challenges.

First, I curated the RAD-ChestCT data set of 36,316 volumes from 19,993 unique patients. I downloaded whole CT volumes in DICOM format using an API developed for the Duke vendor neutral archive. Then I developed the first end-to-end Python pipeline for CT preprocessing, which converts each CT scan from a collection of per-slice DICOM files into a clean 3D NumPy array compatible with major machine learning frameworks. At present, RAD-ChestCT is the largest multiply-annotated volumetric medical imaging data set in the world.

Next, to obtain high-quality labels suitable for training a multiple abnormality classifier, I developed SARLE, a rule-based expert system for automatically extracting abnormality x location labels from free-text radiology reports. SARLE is the first approach to obtain both abnormality and location information from reports, and it obtains high performance, with an average abnormality F-score of 97.6.

A fundamental form of CT interpretation is identification of all abnormalities in the scan. However, prior work has focused on only one class of abnormalities at a time. To address this I developed the CT-Net model, a deep CNN for multiple abnormality prediction from whole volumes. CT-Net achieves an AUROC >90 for 18 abnormalities, with an average AUROC of 77.3 for all 83 abnormalities. Furthermore, training on more abnormalities significantly improves performance. For a subset of 9 labels the model's average AUROC increased by 10% when the number of training labels was increased from 9 to all 83.

One limitation of CT-Net is its lack of explainability. I thus propose AxialNet, a CNN that leverages multiple instance learning to enable identification of key axial slices. Next, I identify a serious problem with the popular model explanation method Grad-CAM: Grad-CAM sometimes creates the false impression that the model has focused on the wrong organ. To address this problem, I propose HiResCAM, a novel attention mechanism for visual explanation in CNNs. I prove that HiResCAM is a generalization of the CAM method and has the intuitive interpretation of highlighting exactly which locations in the input volume lead to an increased score for a particular abnormality, for any CNN ending in a single fully connected layer. Finally, I combine HiResCAM with PARTITION, an approach to obtain allowed regions for each abnormality without manual labeling, to create a mask loss that yields a 37% improvement in organ localization of abnormalities.

Overall, this work advances CNN explanation methods and the clinical applicability of multi-abnormality modeling in volumetric medical images, contributing to the goal of fully automated systems for CT interpretation.





Draelos, Rachel Lea Ballantyne (2021). Towards Fully Automated Interpretation of Volumetric Medical Images with Deep Learning. Dissertation, Duke University. Retrieved from


Dukes student scholarship is made available to the public using a Creative Commons Attribution / Non-commercial / No derivative (CC-BY-NC-ND) license.