Response Assessment and Prediction in Esophageal Cancer Patients via F-18 FDG PET/CT Scans
Purpose: The purpose of this study is to utilize F-18 FDG PET/CT scans to determine an indicator for the response of esophageal cancer patients during radiation therapy. There is a need for such an indicator since local failures are quite common in esophageal cancer patients despite modern treatment techniques. If an indicator is found, a patient's treatment strategy may be altered to possibly improve the outcome. This is investigated with various standard uptake volume (SUV) metrics along with image texture features. The metrics and features showing the most promise and indicating response are used in logistic regression analysis to find an equation for the prediction of response.
Materials and Methods: 28 patients underwent F-18 FDG PET/CT scans prior to the start of radiation therapy (RT). A second PET/CT scan was administered following the delivery of ~32 Gray (Gy) of dose. A physician contoured gross tumor volume (GTV) was used to delineate a PET based GTV (GTV-pre-PET) based on a threshold of >40% and >20% of the maximum SUV value in the GTV. Deformable registration was used in VelocityAI software to register the pre-treatment and intra-treatment CT scans so that the GTV-pre-PET contours could be transferred from the pre to intra scans (GTV-intra-PET). The fractional decrease in the maximum, mean, volume to the highest intensity 10%-90%, and combination SUV metrics of the significant previous SUV metrics were compared to post-treatment pathologic response for an indication of response. Next for the >40% threshold, texture features based on a neighborhood gray-tone dimension matrix (NGTDM) were analyzed. The fractional decrease in coarseness, contrast, busyness, complexity, and texture strength were compared to the pathologic response of the patients. From these previous two types of analysis, SUV and texture features, the two most significant results were used in logistic regression analysis to find an equation to predict the probability of a non-responder. These probability values were then used to compare against the pathological response to test for indication of response.
20 of the 28 patients underwent post treatment surgery and their pathologic response was determined. 9 of the patients were classified as being responders (treatment effect grade ≤ 1) while 11 of the patients were classified as being non-responders (treatment effect grade > 1). The fractional difference in the different SUV metrics has shown that the most commonly used maximum SUV and mean SUV were not significant in determining response to the treatment. Other SUV metrics however did show promise as being indicators. For the >40% threshold SUV to the highest 10%, 20%, and 30% (SUV10%, SUV20%, SUV30%) were found to significantly distinguish between responders and non-responders (p=0.004) and had an area under the Receiver Operating Characteristic curve (AUC) of 0.7778. Combining these significant metrics (SUV10% with SUV20% and SUV 20% with SUV30%) also was able to distinguish response (p=0.033, AUC=0.7879). Cross validation of these results shown that these metrics could be used to find the response on previously unseen data. The three individual SUV terms distinguished responders from non-responders with a sensitivity of 0.7143 and a specificity of 0.6400 from the cross validation. Cross validation yielded a sensitivity of 0.8333 and a specificity of 0.7727 for the combination of SUV10% and SUV20% and a sensitivity of 0.8333 and specificity of 0.7273 for the combination of SUV20% and SUV30%. For the >20% threshold two SUV metrics were found to be significant. These were the SUV to the highest 10% and 20% (p=0.0048). The AUC for the 10% metrics was 0.7677 and for the 20% metric it was 0.7374. Cross validation of these two metrics shown that the 10% metric was the better indicator with being able to distinguish response in unseen data with a sensitivity of 0.7778 and a specificity of 0.7727.
The only texture feature that was able to determine response was complexity (p-0.04, AUC=0.7778). This metric was no more significant than the three individual SUV metrics but less significant than both of the combination metrics. As with the SUV metrics, cross validation was able to show the robustness of these results. Cross validation yielded a result that could accurately distinguish a response with a sensitivity of 0.8333 and a specificity of 0.7273. Logistic regression fit with features of the two most significant results (complexity and combination of SUV10% with SUV20%) yielded the most significant result (p=0.004. AUC=0.8889). Cross validation of this model resulted in a sensitivity of 0.7982 and a specificity 0.7940. This shows that the model would accurately predict the response to unseen data.
This study revealed that previously used SUV metrics, maximum and mean SUV, may have to be rethought about being used to determine a response in esophageal cancer patients. The most promising SUV metric was a combination of the SUV10% and SUV20% metric for a GTV created from a threshold of >40% of the maximum SUV value, while the most significant texture feature was complexity. The overall best indicator was the logistic regression fit of the significant metrics of complexity and combination of SUV10% with SUV20%. This was able to distinguish responders from non-responders with a threshold of 0.3186 (sensitivity=0.9091, specificity=0.7778).
Medical imaging and radiology
This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 United States License.
Rights for Collection: Masters Theses