Understanding and Modeling Human Planners’ Strategy in Human-automation Interaction in Treatment Planning Using Deep Learning and Reinforcement Learning

Limited Access
This item is unavailable until:
2025-05-25

Date

2023

Journal Title

Journal ISSN

Volume Title

Abstract

Purpose: Radiation therapy aims to deliver high energy radiation beam to eradicate cancer cells. Due to radiation toxicity to normal tissue, treatment planning process is needed to customize the radiation beam towards patient specific treatment geometry while minimizing radiation dose to the normal tissue. Treatment planning is often, however, a trial-and-error process to generate ultimate optimal dose distribution. Breast cancer radiation therapy is one of the most commonly seen treatment in modern radiation oncology department. Whole breast radiation therapy (WBRT) using electronic compensation is an iterative manual process which is time consuming. Our institution has been using artificial intelligence (AI) based planning tool for whole breast radiation therapy (WBRT) for 3 years. It is unclear how human planner interacts with AI in real clinical setting and whether the human planner can inject additional insight into well-established AI model. Therefore, the first aim of this study to model planners’ interaction with AI using deep neural network (NN). In addition, we proposed a multi-agent reinforcement learning based framework (MultiRL-FE) to self-interact with the treatment planning system with location awareness to improve plan quality via fluence editing.Methods: A total of 1151 patients have been treated since in-house AI-based planning tool was released for clinical use in 2019. All 526 patients treated with single energy beams were included in this study. The AI tool automatically generates fluence maps and creates “AI plan”. Then planner evaluates the plan and attempts manual fluence modification before physician’s approval (“final plan”). The manual-modification-value (MMV) of each beamlet is the difference between fluence maps in AI and “final plan”. The MMV was recorded for each planner. In the first aim, a deep NN using UNet3+ architecture was developed to predict MMV with AI fluence map, corresponding dose map and organ map in the beam’s eye view (BEV). Then the predicted MMV maps were applied on the initial “AI plan”s to generate AI-modified plans (“AI-m plan”). In the second aim, we developed MultiRL-FE to self-interact with a given plan to improve the plan quality. A simplified treatment planning system was built in the Python environment to train the agent. For each pixel in the fluence map, an individual agent was assigned to interact with the environment by editing fluence value and receive rewards based on projected beam ray’s dose profile. Asynchronous advantage actor critic (A3C) algorithm was used as the backbone for reinforcement learning agents’ training. To effectively train the agent, we developed the MultiRL-FE framework by embedding A3C in a fully convolutional neural network. To test the feasibility of the proposed framework, twelve patients from the same cohort were collected(6 for training and testing respectively). ”Final plans” were perturbed with 10% dose variation to evaluate the potential of the framework to improve the plan. The agent was designed to iteratively modify the fluence maps for 10 iterations. The modified fluence intensity was imported into the Eclipse treatment planning system for dose calculation. For both aims, plan quality was evaluated by dosimetric endpoints including breast PTV V95%(%), V105%(%), V110%(%), lung V20Gy(%) and heart V5Gy(%). Results: In the first aim, the “AI-m plans” generated by HAI network showed statistically significant improvement (p<.05) in hotspot control compared with the initial AI-plan, with an average of -25.2cc volume reduction in breast V105% and -0.805% decrease in Dmax. The planning target volume (PTV) coverage were similar to AI-plan and “final plan”. In the second aim of MultiRL-FE testing, the RL modified plans showed a substantial hotspot reduction from the initial plans. The average PTV V105%(%) of testing set was reduced from 77.78(\pm2.78) to 16.97 (\pm9.42), while clinical plans’ was 3.34(\pm2.73). Meanwhile, the modified plans showed improved dose coverage over the clinical plans, with 70.45(\pm3.94) compared to 65.44(\pm5.39) for V95%(%). Conclusions: In the first part of this study, we proposed a HAI model to enhance the clinical AI tool by reducing hotspot volume from a human perspective. By understanding and modeling the human-automation interaction , this study could advance the widespread clinical application of AI tools in radiation oncology departments with improved robustness and acceptability. In the second part, we developed a self-interactive treatment planning agent with multi-agents reinforcement learning. It offers the advantage of fast location-aware dose editing and can serve as an alternative optimization tool for intensity-modulated radiation therapy and electronic tissue compensation-based treatment planning.

Description

Provenance

Citation

Citation

Yang, Dongrong (2023). Understanding and Modeling Human Planners’ Strategy in Human-automation Interaction in Treatment Planning Using Deep Learning and Reinforcement Learning. Master's thesis, Duke University. Retrieved from https://hdl.handle.net/10161/27813.

Collections


Dukes student scholarship is made available to the public using a Creative Commons Attribution / Non-commercial / No derivative (CC-BY-NC-ND) license.