# Browsing by Subject "Reinforcement Learning"

###### Results Per Page

###### Sort Options

Item Open Access Feature Selection for Value Function Approximation(2011) Taylor, GavinThe field of reinforcement learning concerns the question of automated action selection given past experiences. As an agent moves through the state space, it must recognize which state choices are best in terms of allowing it to reach its goal. This is quantified with value functions, which evaluate a state and return the sum of rewards the agent can expect to receive from that state. Given a good value function, the agent can choose the actions which maximize this sum of rewards. Value functions are often chosen from a linear space defined by a set of features; this method offers a concise structure, low computational effort, and resistance to overfitting. However, because the number of features is small, this method depends heavily on these few features being expressive and useful, making the selection of these features a core problem. This document discusses this selection.

Aside from a review of the field, contributions include a new understanding of the role approximate models play in value function approximation, leading to new methods for analyzing feature sets in an intuitive way, both using the linear and the related kernelized approximation architectures. Additionally, we present a new method for automatically choosing features during value function approximation which has a bounded approximation error and produces superior policies, even in extremely noisy domains.

Item Embargo Innovations in Decompression Sickness Prediction and Adaptive Ascent Algorithms(2023) Di Muro, GianlucaDecompression Sickness (DCS) is a potentially serious medical condition which can occur in humans when there is a decrease in ambient pressure. While it is generally accepted that DCS is initiated by the formation and growth of inert gas bubbles in the body, the mechanisms of its various forms are not completely understood. Complicating matters, divers often face challenges in adhering to predetermined safe ascent paths due to unpredictable environmental conditions. Therefore, the challenge of improving dive safety is twofold: 1) enhancing the accuracy of models in predicting DCS risk for a given dive profile; 2) developing algorithms, recommending safe ascent profiles, and capable of adapting in real time to new unforeseen diving conditions. This dissertation addresses both problems in the context of diving applications.First, we examine how the DCS risk is partitioned in air decompression dives to identify which portion of the dive is the most challenging. Our findings show that most of the risk might be accrued at surface, or during the ascent phase, depending on the specific mission parameters. Subsequently, we conducted a comprehensive investigation into DCS models incorporating inter-tissue perfusion dynamics. We proposed a novel algorithm to optimize these models efficiently. Our results determined that a model neglecting the coupling of faster tissue to slower tissues outperformed all other models on O2 surface decompression dive profiles. We further conducted experiments with various compartment tissue connections, involving diffusion phenomena and introducing delayed dynamics, while also exploring different risk functions. By adopting the Akaike Information Criterion, we found that the best performing model on the training set was BQE22AXT4, a four-compartment model featuring a risk threshold term only in the fourth compartment. Conversely, the classical Linear-Exponential model demonstrated superior performance on the extrapolation set. Finally, we introduce a groundbreaking real-time algorithm that delivers a secure and time optimized ascent path capable of adapting to unanticipated conditions. Our approach harnesses the power of advanced machine learning techniques and backward optimal control. Through our comprehensive analysis, we demonstrate that this innovative methodology attains a safety level on par with precomputed NAVY tables, while offering the added advantage of dynamic adaptation in response to unexpected events.

Item Open Access Locally Adaptive Protocols for Quantum State Discrimination(2021) Brandsen, SarahThis dissertation makes contributions to two rapidly developing fields: quantum information theory and machine learning. It has recently been demonstrated that reinforcement learning is an effective tool for a wide variety of tasks in quantum information theory, ranging from quantum error correction to quantum control to preparation of entangled states. In this work, we demonstrate that reinforcement learning is additionally highly effective for the task of multiple quantum hypothesis testing.

Quantum hypothesis testing consists of finding the quantum measurement which allows one to discriminate with minimal error between $m$ possible states $\{\rho_{k}\}|_{k=1}^{m}$ of a quantum system with corresponding prior probabilities $p_{k} = \text{Pr}[\rho = \rho_{k}]$. In the general case, although semi-definite programming offers a way to numerically approximate the optimal solution~\cite{Eldar_Semidefinite2}, a closed-form analytical solution for the optimal measurement is not known.

Additionally, when the quantum system is large and consists of many subsystems, the optimal measurement may be experimentally difficult to implement. In this work, we provide a comprehensive study of locally adaptive approaches to quantum hypothesis testing where only a single subsystem is measured at a time and the order and types of measurements implemented may depend on previous measurement results. Thus, these locally adaptive protocols present an experimentally feasible approach to quantum state discrimination.

We begin with the case of binary hypothesis testing (where $m=2$), and generalize previous work by Acin et al. (Phys. Rev. A 71, 032338) to show that a simple Bayesian-updating scheme can optimally distinguish between any pair of arbitrary pure, tensor product quantum states. We then demonstrate that this same Bayesian-updating scheme has poor asymptotic behaviour when the candidate states are not pure, and based on this we introduce a modified scheme with strictly better performance. Finally, a dynamic programming (DP) approach is used to find the optimal local protocol for binary state discrimination and numerical simulations are run for both qubit and qutrit subsystems.

Based on these results, we then turn to the more general case of multiple hypothesis testing where there may be several candidate states. Given that the dynamic-programming approach has a high complexity when there are a large number of subsystems, we turn to reinforcement learning methods to learn adaptive protocols for even larger systems. Our numerical results support the claim that reinforcement learning with neural networks (RLNN) is able to successfully find the optimal locally adaptive approach for up to 20 subsystems. We additionally find the optimal collective measurement through semidefinite programming techniques, and demonstrate that the RLNN approach meets or comes close to the optimal collective measurement in every random trial.

Next, we focus on quantum information theory and provide an operational interpretation for the entropy of a channel. This task is motivated by the central role of entropy across several areas of physics and science. We use games of chance as a more systematic and unifying approach to define entropy, as a system's performance in any game of chance depends solely on the uncertainty of the output. We construct families of games which result in a pre-order on channels and provide an operational interpretation for all pre-orders (corresponding to majorization, conditional majorization, and channel majorization respectively), and this defines the unique asymptotically continuous entropy function for classical channels.

Item Open Access Model-based Reinforcement Learning in Modified Levy Jump-Diffusion MarkovDecision Model and Its Financial Applications(2017-11-15) Zhu, ZheqingThis thesis intends to address an important cause of the 2007-2008 financial crisis by incorporating prediction on asset pricing jumps in asset pricing models, the non-normality of asset returns. Several different machine learning techniques, including the Unscented Kalman Filter and Approximate Planning are used, and an improvement in Approximate Planning is developed to improve algorithm time complexity with limited loss in optimality. We obtain significant result in predicting jumps with market sentiment memory extracted from Twitter. With the model, we develop a reinforcement learning module that achieves good performance and which captures over 60% of profitable periods in the market.Item Open Access Nonlinear Energy Harvesting With Tools From Machine Learning(2020) Wang, XuesheEnergy harvesting is a process where self-powered electronic devices scavenge ambient energy and convert it to electrical power. Traditional linear energy harvesters which operate based on linear resonance work well only when excitation frequency is close to its natural frequency. While various control methods applied to an energy harvester realize resonant frequency tuning, they are either energy-consuming or exhibit low efficiency when operating under multi-frequency excitations. In order to overcome these limitations in a linear energy harvester, researchers recently suggested using "nonlinearity" for broad-band frequency response.

Based on existing investigations of nonlinear energy harvesting, this dissertation introduced a novel type of energy harvester designs for space efficiency and intentional nonlinearity: translational-to-rotational conversion. Two dynamical systems were presented: 1) vertically forced rocking elliptical disks, and 2) non-contact magnetic transmission. Both systems realize the translational-to-rotational conversion and exhibit nonlinear behaviors which are beneficial to broad-band energy harvesting.

This dissertation also explores novel methods to overcome the limitation of nonlinear energy harvesting -- the presence of coexisting attractors. A control method was proposed to render a nonlinear harvesting system operating on the desired attractor. This method is based on reinforcement learning and proved to work with various control constraints and optimized energy consumption.

Apart from investigations of energy harvesting, several techniques were presented to improve the efficiency for analyzing generic linear/nonlinear dynamical systems: 1) an analytical method for stroboscopically sampling general periodic functions with arbitrary frequency sweep rates, and 2) a model-free sampling method for estimating basins of attraction using hybrid active learning.

Item Open Access PAC-optimal, Non-parametric Algorithms and Bounds for Exploration in Concurrent MDPs with Delayed Updates(2015) Pazis, JasonAs the reinforcement learning community has shifted its focus from heuristic methods to methods that have performance guarantees, PAC-optimal exploration algorithms have received significant attention. Unfortunately, the majority of current PAC-optimal exploration algorithms are inapplicable in realistic scenarios: 1) They scale poorly to domains of realistic size. 2) They are only applicable to discrete state-action spaces. 3) They assume that experience comes from a single, continuous trajectory. 4) They assume that value function updates are instantaneous. The goal of this work is to bridge the gap between theory and practice, by introducing an efficient and customizable PAC optimal exploration algorithm, that is able to explore in multiple, continuous or discrete state MDPs simultaneously. Our algorithm does not assume that value function updates can be completed instantaneously, and maintains PAC guarantees in realtime environments. Not only do we extend the applicability of PAC optimal exploration algorithms to new, realistic settings, but even when instant value function updates are possible, our bounds present a significant improvement over previous single MDP exploration bounds, and a drastic improvement over previous concurrent PAC bounds. We also present Bellman error MDPs, a new analysis methodology for online and offline reinforcement learning algorithms, and TCE, a new, fine grained metric for the cost of exploration.

Item Open Access Towards Uncertainty and Efficiency in Reinforcement Learning(2021) Zhang, RuiyiDeep reinforcement learning (RL) has received great success in playing video games and strategic board games, where a simulator is well-defined, and massive samples are available. However, in many real-world applications, the samples are not easy to collect, and the collection process may be expensive and risky. We consider designing sample efficient RL algorithms for online exploration and learning from offline interactions. In this thesis, I will introduce algorithms that quantify uncertainty via exploiting intrinsic structures within observations to improve sample complexity. These proposed algorithms are theoretically sound and show broad applicability in recommendation, computer vision, operations management, and natural language processing. This thesis consists of two parts: (i) efficient exploration and (ii) data-driven reinforcement learning.

Exploration-exploitation has been widely recognized as a fundamental trade-off. An agent can take exploration actions to learn a better policy or take exploitation actions with the highest reward. A good exploration strategy can improve sample complexity as a policy can converge faster to near optimality via collecting informative data. Better estimation and usage of uncertainty lead to more efficient exploration, as the agent can efficiently explore to better understand environments, \textit{i.e.}, minimizing uncertainty. In the efficient exploration part, we place the reinforcement learning into the probability measure space and formulate it as Wasserstein gradient flows. The proposed method can quantify the uncertainty of value, policy, and constraint functions to provide efficient exploration.

Running a policy in real environments can be expensive and risky. Besides, there are massive logged datasets available. Data-driven RL can effectively exploit these fixed datasets to perform policy improvement or evaluation. In the data-driven RL part, we consider auto-regressive sequence generation as a real-world sequential decision-making problem, where exploiting uncertainty is useful for generating faithful and informative sequences. Specifically, a planning mechanism has been integrated into generation as model-predictive sequence generation. We also realized that most RL-based training schemes are not aligned with human evaluations due to the poor lexical rewards or simulators. To alleviate this issue, we consider semantic rewards, implemented by the generalized Wasserstein distance. It is also nice to see these new schemes can be interpreted as Wasserstein gradient flows.