A Radiomics Machine Learning Model for Post-Radiotherapy Overall Survival Prediction of Non-Small Cell Lung Cancer (NSCLC)
Abstract
Purpose: To predict post-radiotherapy overall survival group of NSCLC patients based on clinical information and radiomics analysis of simulation CT. Materials/Methods: A total of 258 non-adenocarcinoma patients who received radical radiotherapy or chemo-radiation were studied: 45/50/163 patients were identified as short(0-6mos)/mid(6-12mos)/long(12+mos) survival groups, respectively. For each patient, we first extracted 76 radiomics features within the gross tumor volume(GTV) identified in the simulation CT; these features were combined with patient clinical information (age, overall stage, and GTV volume) as a patient-specific feature vector, which was utilized by a 2-step machine learning model for survival group prediction. This model first identifies patients with long survival prediction via a supervised binary classifier; for those with otherwise prediction, a 2nd classifier further generates short/mid survival prediction. Two machine learning classifiers, explainable boosting machine(EBM) and balanced random forest(BRF), were interrogated as a comparison study. During the model training, all patients were divided into training/test sets by an 8:2 ratio, and 100-fold random sampling were applied to the training set with a 7:1 validation ratio. Model performances were evaluated by the sensitivity, accuracy, and ROC results. Results: The model with EBM demonstrated an overall ROC AUC (0.58±0.04) with limited sensitivities in short (0.02±0.04) and mid group (0.11±0.08) predictions due to imbalanced data sample distribution. In contrast, the model with BRF improved short/mid group sensitivities to 0.32±0.11/0.29±0.16, respectively, but the improvement of ROC AUC (0.60±0.04) is limited. Nevertheless, both EBM (0.46±0.04) and BRF (0.57±0.04) approaches achieved limited overall accuracy; a noticeable overlap was found in their feature lists with top 10 feature weight rankings. Conclusion: The proposed two-step machine learning model with BRF classifier possesses a better performance than the one with EBM classifier in the post-radiotherapy survival group prediction of NSCLC. Future works, preferably in the joint use of deep learning, are in demand to further improve the prediction results.
Type
Department
Description
Provenance
Subjects
Citation
Permalink
Citation
Zhang, Rihui (2023). A Radiomics Machine Learning Model for Post-Radiotherapy Overall Survival Prediction of Non-Small Cell Lung Cancer (NSCLC). Master's thesis, Duke University. Retrieved from https://hdl.handle.net/10161/29091.
Collections
Dukes student scholarship is made available to the public using a Creative Commons Attribution / Non-commercial / No derivative (CC-BY-NC-ND) license.