Benign and Malignant Lymph Nodes Classification in Non-Small Cell Lung Cancer via Machine Learning Model

dc.contributor.advisor

Yin, Fang-Fang

dc.contributor.author

Ge, Jingyu

dc.date.accessioned

2024-06-06T13:50:13Z

dc.date.issued

2024

dc.department

Medical Physics

dc.description.abstract

Objective:To develop a machine learning model that integrates deep learning image features and radiomics features to classify lymph nodes as benign or malignant in Non-Small Cell Lung Cancer (NSCLC). Methods: The dataset comprises contrast-enhanced CT scans from 541 lung cancer patients before surgery, collected at a Shanghai Hospital between July 2015 and December 2017 under an IRB study. It includes 1,237 lymph nodes, identified from preoperative CT scans due to enlargement and confirmed as non-small cell lung cancer (NSCLC) via surgical pathology. Lymph node classification into malignant or benign categories utilized in postoperative pathological reports. Our method employs a dual radiomic feature extraction strategy. The deep image features (DIF) were derived from the final convolutional layer of a pre-trained VGG-16 encoder network to characterize the lymph node’s image texture. A total of nine 2D shape-based radiomic features (RF) are extracted based on the Py-radiomics calculation toolbox to characterize lymph node morphological information. And ninety-two handcrafted radiomic features (HRF) are extracted. The extracted DIF, RF, and HRF were combined and fed into a Random Forest classifier for the benign and malignant lymph node classification. The random forest classifier was trained following an 8:2 train/test split ratio and evaluated using Area Under the Curve (AUC), Receiver Operating Characteristic (ROC), and p-value, and 5-fold cross-validation was also employed to objectively evaluate model performance.

Results: The mean AUC for the Random Forest classifier using only 2D shape features is 0.691, while mean AUC for the classifier employing only DIF is 726. Utilizing both DIF and HRF for classification resulted in an average AUC of 0.724, whereas integrating RF with DIF achieved superior classification efficacy, boasting the highest average AUC of 0.746. All results were considered statistically significant with a p-value of less than 0.05. Conclusion: The combination of image texture analysis refers to DIF with morphological information offers an enhanced characterization ability to classify lymph nodes as benign or malignant from CT images for lung NSCLC patients.

dc.identifier.uri

https://hdl.handle.net/10161/31052

dc.rights.uri

https://creativecommons.org/licenses/by-nc-nd/4.0/

dc.subject

Medical imaging

dc.subject

Lymph node

dc.subject

Machine Learning

dc.subject

Non-Small Cell Lung Cancer

dc.subject

Radimocs

dc.title

Benign and Malignant Lymph Nodes Classification in Non-Small Cell Lung Cancer via Machine Learning Model

dc.type

Master's thesis

duke.embargo.months

24

duke.embargo.release

2026-06-06T13:50:13Z

Files

Collections