Browsing by Subject "Reinforcement Learning"
Results Per Page
Sort Options
Item Open Access Automated Intensity Modulated Radiation Therapy (IMRT) using fast dose and fluence calculations and Reinforcement Learning(2024) Stephens, Hunter ScottIonizing radiation is a powerful tool in the fight against cancer. Its potentially lethal effect on cells can halt or even irradicate tumor growth by destroying malignant tumor cells. This is a positive effect on unwanted tumors, but a negative one for surrounding healthy tissue. The goal of radiation therapy is to irradiate a tumor with a dosage as close to the desired amount as possible while sparing the healthy surrounding tissue. Intensity Modulated Radiation Therapy (IMRT) gives the ability to shape a dose distribution through modulating the intensity of the radiation field at different points. This creates a 2D intensity pattern that is linked to a 3D dose distribution through the transport of radiation through the patient. Determining this intensity pattern is a highly coupled numerical optimization problem that relies on a set of objective inputs. These inputs are determined by a human planner and iteratively updated to reach an optimal plan to be delivered to the patient. These constraints depend on the treatment site and may vary based on the patient anatomy. Determining these constraints is a time-consuming problem for cases involving the Pancreas or in the Head and Neck region. For the Pancreas, several gastrointestinal (GI) structures, namely the Stomach, Bowel, and C-Loop, are usually nestled closely to the tumor. This introduces a tradeoff between providing a necessary dose to the target or completely preserving those important organs. The Head and Neck region also poses problems in sparing organs proximal to the tumor such as the parotid glands and oral cavity. Head and neck tumors can also be very large and asymmetric with large overlaps with surrounding organs at risk. The goal of this thesis was to develop and to investigate a framework for automated treatment planning. This involves being able to calculate the dose and optimal fluence and developing a machine learning model to create relevant optimization structures and set constraints. The steps for the thesis are as follows. (i) First, a dose calculation algorithm was developed that is computationally cheap. Machine learning algorithms will rely on numerous calculations of the dose and thus it must be fast and lightweight. There are many commercial algorithms available, but for these purposes it is best to develop a custom engine specifically for the task at hand to minimize cost. This was accomplished by using an analytical definition of a finite-sized pencil beam model parameterized for both depth and off-axis distance and fitted to the beams used in delivering treatment. The addition of variable kernel width was added in to reduce computational cost in both the speed and storage of the calculation. (ii) Second, an optimization engine was developed to quickly find an optimal fluence map given a constraint set. The optimization problem relies on knowing the absorbed dose from a finite sized beamlet to a specific point for all points and beamlets. This is quite expensive, and work must be done to reduce this cost. Analysis was performed to ascertain the effect the cost reduction techniques introduced into the dose calculation would have on the optimization problem. The optimization algorithm was then evaluated to determine the optimal kernel truncation length. (iii) The problem of handling overlapping structures with contrasting constraints has been formulated in a way that an auto-planning system can handle. Pancreas SBRT plans with a simultaneous integrated boost (SIB) are good example of this situation. Previous auto-planning frameworks were modified to specifically deal with the dose gradient around these proximal regions. A reinforcement learning agent was then trained to plan for these scenarios. (iv) Finally, the coupling of plan states and potential actions has been elucidated for determining the control points for structure’s volume effect constraints. Principal Component Analysis (PCA) along with geometric properties such as inflection points and points of maximum curvature were used to correlate the states of a dose-volume histogram to control actions. This was studied and implemented into the beginnings of an automated treatment planning system and demonstrated with Head and Neck cases. The system’s state and action transition probabilities were also investigated to ascertain the stability of the learning process and to ensure the state definition was complete and satisfied the properties of a Markov decision process. The automated system was tested to ascertain the ability of the computer agent to learn how to plan with multiple goals and was shown to be capable of learning techniques providing a foundation for computer automated and aided planning.
Item Open Access Toward Assured Autonomy with Model-Free Reinforcement Learning(2024) Bozkurt, Alper KamilAutonomous systems (AS), enhanced by the capabilities of reinforcement learning (RL), are expected to perform increasingly sophisticated tasks across various civilian and industrial application domains. This expectation arises from their promising ability to make decisions solely based on perception without human intervention. In addition to high efficiency, AS often require robustness and safety guarantees for real-world deployment. In this thesis, we propose model-free RL approaches that obtain controllers for AS operating in unknown, stochastic, and potentially adversarial environments directly from linear temporal logic (LTL) specifications defined on state labels, such as safety and liveness requirements. This ensures that the learned controllers satisfy the desired properties, avoiding unintended consequences, and remain robust against adversarial behavior.We first derive a novel rewarding and discounting mechanism from the LTL specifications for Markov decision processes. We show that a policy learned by a model-free RL algorithm, which maximizes the sum of these discounted rewards, also maximizes the probability of satisfying the LTL specifications. We generalize this approach to multiple objectives, where the utmost priority is given to ensuring safety. Satisfaction of the other LTL specifications takes a secondary role, and the tertiary objective is to enhance the quality of control. We then extend our results to zero-sum stochastic games to ensure the robustness of learned controllers against any unpredictable nondeterministic environment behavior. Addressing the scalability challenges inherent in learning controllers for stochastic games, we propose heuristics and approximate methods to further accelerate the learning process. We illustrate how our approach can be utilized to learn controllers that are resilient against stealthy attackers, capable of disrupting the agent's actuation without being detected. We further discuss an approach for cases where state labels are absent. This approach aims to learn a labeling function that translates raw state information into object properties applicable in LTL specifications, thereby enabling the learning of controllers from LTL specifications. We conclusively show the effectiveness of our approaches in successfully learning optimal controllers through numerous case studies. These controllers maximize the probability of satisfying LTL specifications in the worst case, thereby exhibiting resilience against adversarial behavior. Moreover, our methods demonstrate scalability across a broad spectrum of LTL specifications, consistently surpassing the performance of existing approaches.