A Radiomics-Embedded Vision Transformer for Breast Cancer Ultrasound Image Classification Efficiency Improvement

Loading...
Thumbnail Image
Limited Access
This item is unavailable until:
2026-06-06

Date

2024

Journal Title

Journal ISSN

Volume Title

Abstract

Purpose: To develop a radiomics-embedded vision transformer (RE-ViT) model by incorporating radiomics features into its architecture, seeking to improve the model's efficiency in medical image recognition towards enhanced breast ultrasound image diagnostic accuracy.Materials and Methods: Following the classic ViT design, the input image was first resampled into multiple 16×16 grid image patches. For each patch, 56-dimensional habitat radiomics features, including intensity-based, Gray Level Co-Occurrence Matrix (GLCOM)-based, and Gray Level Run-Length Matrix (GLRLM)-based features, were extracted. These features were designed to encode local-regional intensity and texture information comprehensively. The extracted features underwent a linear projection to a higher-dimensional space, integrating them with ViT’s standard image embedding process. This integration involved an element-wise addition of the radiomics embedding with ViT’s projection-based and positional embeddings. The resultant combined embeddings were then processed through a Transformer encoder and a Multilayer Perceptron (MLP) head block, adhering to the original ViT architecture. The proposed RE-ViT model was studied using a public BUSI breast ultrasound dataset of 399 patients with benign, malignant, and normal tissue classification. The comparison study includes: (1) RE-ViT versus classic ViT training from scratch, (2) pre-trained RE-ViT versus pre-trained ViT (based on ImageNet-21k), (3) RE-ViT versus VGG-16 CNN model. The model performance was evaluated based on accuracy, ROC AUC, sensitivity, and specificity with 10-fold Monte-Carlo cross validation. Result: The RE-ViT model significantly outperformed the classic ViT model, demonstrating superior overall performance with accuracy = 0.718±0.043, ROC AUC = 0.848±0.033, sensitivity = 0.718±0.059, and specificity = 0.859±0.048. In contrast, the classic ViT model achieved accuracy = 0.473±0.050, ROC AUC = 0.644±0.062, sensitivity = 0.473±0.101, and specificity = 0.737±0.065. Pre-trained versions of RE-ViT also showed enhanced performance (accuracy = 0.864±0.031, ROC AUC = 0.950±0.021, sensitivity = 0.864±0.074, specificity = 0.932±0.036) compared to pre-trained ViT (accuracy = 0.675±0.111, ROC AUC = 0.872±0.086, sensitivity = 0.675±0.129, specificity = 0.838±0.096). Additionally, RE-ViT surpassed VGG-16 CNN results (accuracy = 0.553±0.079, ROC AUC = 0.748±0.080, sensitivity = 0.553±0.112, specificity = 0.777±0.089). Conclusion: The proposed radiomics-embedded ViT was successfully developed for ultrasound-based breast tissue classification. Current results underscore the potential of our approach to advance other transformer-based medical image diagnosis tasks.

Description

Provenance

Citation

Citation

Zhu, Haiming (2024). A Radiomics-Embedded Vision Transformer for Breast Cancer Ultrasound Image Classification Efficiency Improvement. Master's thesis, Duke University. Retrieved from https://hdl.handle.net/10161/31055.

Collections


Dukes student scholarship is made available to the public using a Creative Commons Attribution / Non-commercial / No derivative (CC-BY-NC-ND) license.