Knowledge Discovery in Databases of Radiation Therapy Treatment Planning

Thumbnail Image




Sheng, Yang


Yin, Fang-Fang

Journal Title

Journal ISSN

Volume Title

Repository Usage Stats



Radiation has been utilized in medical domain for multiple purposes. Treating cancer using radiation has increasing popularity during the last century. Radiation beam is directed to the tumor cells while the surrounding healthy tissue is attempted to be avoided. Radiation therapy treatment planning serves the goal of delivering high concentrated radiation to the treatment volume while minimizing the normal tissue as much as possible. With the advent of more sophisticated delivery technology, treatment planning time increases over time. In addition, the treatment plan quality relies on the experience of the planner. Several computer assistance techniques emerged to help the treatment planning process, among which knowledge-based planning (KBP) has been successful in inverse planning IMRT. KBP falls under the umbrella of Knowledge Discovery in Databases (KDD) which originated in industry. The philosophy is to extract useful knowledge from previous application/data/observations to make predictions in the future practice. KBP reduces the iterative trial-and-error process in manual planning, and more importantly guarantees consistent plan quality. Despite the great potential of treatment planning KDD (TPKDD), three major challenges remain before TPKDD can be widely implemented in the clinical environment: 1. a good knowledge model asks for sufficient amount of training data to extract useful knowledge and is therefore less efficient; 2. a knowledge model is usually only applicable for the specific treatment site and treatment technique and is therefore less generalizable; 3. a knowledge model needs meticulous inspection before implementing in the clinic to verify the robustness.

This study aims at filling in the niche in TPKDD and improves current TPKDD workflow by tackling the aforementioned challenges. This study is divided into three parts. The first part of the study aims to improve the modeling efficiency by introducing an atlas-based treatment planning guidance. In the second part of the study, an automated treatment planning technique for whole breast radiation therapy (WBRT) is proposed to provide a solution for the area where TPKDD has not yet set foot on. In the third part of the study, several topics related to the knowledge model quality are addressed, including improving the model training workflow, identifying geometric novelty and dosimetric outlier case, building a global model and facilitating incremental learning.

I. Improvement of the modeling efficiency. First, a prostate cancer patient anatomy atlas was established to generate 3D dose distribution guidance for the new patient. The anatomy pattern of the prostate cancer patient was parameterized with two descriptors. Each training case was represented in 2D feature space. All training cases were clustered using the k-medoids algorithm. The optimal number of clusters was determined by the largest average silhouette width. For the new case, the most similar case in the atlas was identified and used to generate dose guidance. The anatomy of the atlas case and the query case was registered and the deformation field was applied to the 3D radiation dose of the atlas case. The deformed dose served as the goal dose for the query case. Dose volume objectives were then extracted from the goal dose to guide the inverse IMRT planning. Results showed that the plans generated with atlas guidance had similar dosimetric quality as compared to the clinical manual plans. The monitor units (MU) of the auto plan were also comparable with the clinical plan. Atlas-guided radiation therapy has proven to be effective and efficient in inverse IMRT planning.

II. Improvement of model generalization. An automatic WBRT treatment planning workflow was developed. First of all, an energy selection tool was developed based on previous single energy and dual energy WBRT plans. The DRR intensity histograms of training cases were collected and the principal component analysis (PCA) was performed to reduce the dimension of the histogram. First two components were used to represent each case and the classification was performed in the 2D space. This tool helps new patient to select appropriate energy based on the anatomy information. Secondly, an anatomy feature based random forest (RF) model was proposed to predict the fluence map for the patient. The model took the input of multiple anatomical features and output the fluence intensity of each pixel within the fluence map. Finally, a physics rule based method was proposed to further fine tune the fluence map to achieve optimal dose distribution within the irradiated volume. Extra validation cases were tested on the proposed workflow. Results showed similar dosimetric quality between auto plan and clinical manual plan. The treatment planning time was reduced from between 1-4 hours for the manual planning to within 1 minute for the auto planning. The proposed automatic WBRT planning technique has proven to be efficient.

III. Rapid learning of radiation therapy KBP. Several topics were analyzed in this part of the study. First of all, a systematic workflow was established to improve the KBP model quality. The workflow started with identifying geometric novelty case using the statistical metric “leverage”, followed by removing the novelty case. Then the dosimetric outlier was identified using studentized residual and then cleaned. The cleaned model was compared with the uncleaned model using the extra validation cases. This study used pelvic cases as an example. Results showed that the existence of novelty and outlier cases did degrade the model quality. The proposed statistical tools can effectively identify novelty and outlier cases. The workflow is able to improve the quality of the knowledge-based model.

Secondly, a clustering-based method was proposed to identify multiple geometric novelty cases and dosimetric outlier cases at the same time. One class support vector machine (OCSVM) was applied to the feature vectors of all training cases to generate one class of inliers while cases falling out of the frontier belonged to the novelty case group. Once the novelty cases were identified and cleaned, the robust regression followed by outlier identification (ROUT) was applied to all remaining cases to identify dosimetric outliers. A cleaned model was trained with the novelty and outlier free case pool and was tested using 10 fold cross validation. Initial training pool included intentionally added outlier cases to evaluate the efficacy of the proposed method. The model prediction on the inlier cases was compared with that of novelty and outlier cases. Results showed that the method can successfully identify geometric novelty and dosimetric outliers. The model prediction accuracy between the inliers and novelty/outliers was significantly different, indicating different dosimetric behavior between two groups. The proposed method proved to be effective in identifying multiple geometric novelty and dosimetric outliers.

Thirdly, a global model using the model tree and the clustering-based model was proposed to include cases with different clinical conditions and indications. The model tree is a combination of decision tree and linear regression, where all cases are branched into leaves and regression is performed within each leaf. A clustering-based model used k-means algorithm to segment all cases into more aggregated groups, and then the regression was performed within each small group. The overall philosophy of both the model tree and the clustering-based method is that cases with similar features have similar geometry-dosimetry relation. Training cases within small feature range gives better model accuracy. The proposed method proved to be effective in improving the model accuracy over the model trained with all cases without segmenting the cases.

At last, the incremental learning was analyzed in radiation therapy treatment planning model. This study tries to answer the question when model re-training should be invoked. In the clinical environment, it is often unnecessary to re-train the model whenever there is a new case. The scenario of incrementally adapting the model was simulated using the pelvic cases with different number of training cases and new incoming cases. The result showed that re-training was often necessary for small training dataset and as the number of cases increased, re-training became less frequent.

In summary, this study addressed three major challenges in TPKDD. In the first part, an atlas-guided treatment planning technique was proposed to improve the modeling efficiency. In the second part, an automatic whole breast radiation therapy treatment planning technique was proposed to tackle the issue where TPKDD has not yet resolved. In the final part, outlier analysis, global model training and incremental learning was further analyzed to facilitate rapid learning, which lay the foundation of future clinical implementation of radiation therapy knowledge models.





Sheng, Yang (2017). Knowledge Discovery in Databases of Radiation Therapy Treatment Planning. Dissertation, Duke University. Retrieved from


Dukes student scholarship is made available to the public using a Creative Commons Attribution / Non-commercial / No derivative (CC-BY-NC-ND) license.