Browsing by Author "Parr, Ronald"
Results Per Page
Sort Options
Item Open Access Efficient selection of disambiguating actions for stereo vision(2010) Schaeffer, MonikaIn many domains that involve the use of sensors, such as robotics or sensor networks, there are opportunities to use some form of active sensing to disambiguate data from noisy or unreliable sensors. These disambiguating actions typically take time and expend energy. One way to choose the next disambiguating action is to select the action with the greatest expected entropy reduction, or information gain. In this work, we consider active sensing in aid of stereo vision for robotics. Stereo vision is a powerful sensing technique for mobile robots, but it can fail in scenes that lack strong texture. In such cases, a structured light source, such as vertical laser line, can be used for disambiguation. By treating the stereo matching problem as a specially structured HMM-like graphical model, we demonstrate that for a scan line with n columns and maximum stereo disparity d, the entropy minimizing aim point for the laser can be selected in O(nd) time - cost no greater than the stereo algorithm itself. A typical HMM formulation would suggest at least O(nd2) time for the entropy calculation alone.Item Open Access Feature Selection for Value Function Approximation(2011) Taylor, GavinThe field of reinforcement learning concerns the question of automated action selection given past experiences. As an agent moves through the state space, it must recognize which state choices are best in terms of allowing it to reach its goal. This is quantified with value functions, which evaluate a state and return the sum of rewards the agent can expect to receive from that state. Given a good value function, the agent can choose the actions which maximize this sum of rewards. Value functions are often chosen from a linear space defined by a set of features; this method offers a concise structure, low computational effort, and resistance to overfitting. However, because the number of features is small, this method depends heavily on these few features being expressive and useful, making the selection of these features a core problem. This document discusses this selection.
Aside from a review of the field, contributions include a new understanding of the role approximate models play in value function approximation, leading to new methods for analyzing feature sets in an intuitive way, both using the linear and the related kernelized approximation architectures. Additionally, we present a new method for automatically choosing features during value function approximation which has a bounded approximation error and produces superior policies, even in extremely noisy domains.
Item Open Access In Pursuit of Simplicity: The Role of the Rashomon Effect for Informed Decision Making(2024) Semenova, LesiaFor high-stakes decision domains, such as healthcare, lending, and criminal justice, the predictions of deployed models can have a huge impact on human lives. The understanding of why models make specific predictions is as crucial as the good performance of these models. Interpretable models, constrained to explain the reasoning behind their decisions, play a key role in enabling users' trust. They can also assist in troubleshooting and identifying errors or data biases. However, there has been a longstanding belief in the community that a trade-off exists between accuracy and interpretability. We formally show that such a trade-off does not exist for many datasets in high-stakes decision domains and that simpler models often perform as well as black-boxes.
To establish a theoretical foundation explaining the existence of simple-yet-accurate models, we leverage the Rashomon set (a set of equally well-performing models). If the Rashomon set is large, it contains numerous accurate models, and perhaps at least one of them is the simple model we desire. We formally present the Rashomon ratio as a new gauge of simplicity for a learning problem, where the Rashomon ratio is the fraction of all models in a given hypothesis space that is in the Rashomon set. Insight from studying the Rashomon ratio provides an easy way to check whether a simpler model might exist for a problem before finding it. In that sense, the Rashomon ratio is a powerful tool for understanding when an accurate-yet-simple model might exist. We further propose and study a mechanism of the data generation process, coupled with choices usually made by the analyst during the learning process, that determines the size of the Rashomon ratio. Specifically, we demonstrate that noisier datasets lead to larger Rashomon ratios through the way practitioners train models. Our results explain a key aspect of why simpler models often tend to perform as well as black box models on complex, noisier datasets.
Given that optimizing for interpretable models is known to be NP-hard and can require significant domain expertise, our foundation can help machine learning practitioners assess the feasibility of finding simple-yet-accurate models before attempting to optimize for them. We illustrate how larger Rashomon sets and noise in the data generation process explain the natural gravitation towards simpler models based on the dataset of complex biology. We further highlight how simplicity is useful for informed decision-making by introducing sparse density trees and lists - an accurate approach to density estimation that optimizes for sparsity.
Item Open Access Model-based Reinforcement Learning in Modified Levy Jump-Diffusion MarkovDecision Model and Its Financial Applications(2017-11-15) Zhu, ZheqingThis thesis intends to address an important cause of the 2007-2008 financial crisis by incorporating prediction on asset pricing jumps in asset pricing models, the non-normality of asset returns. Several different machine learning techniques, including the Unscented Kalman Filter and Approximate Planning are used, and an improvement in Approximate Planning is developed to improve algorithm time complexity with limited loss in optimality. We obtain significant result in predicting jumps with market sentiment memory extracted from Twitter. With the model, we develop a reinforcement learning module that achieves good performance and which captures over 60% of profitable periods in the market.Item Open Access Non-parametric approximate linear programming for MDPs(2012) Pazis, JasonOne of the most difficult tasks in value function based methods for learning in Markov Decision Processes is finding an approximation architecture that is expressive enough to capture the important structure in the value function, while at the same time not overfitting the training samples. This thesis presents a novel Non-Parametric approach to Approximate Linear Programming (NP-ALP), which requires nothing more than a smoothness assumption on the value function. NP-ALP can make use of real-world, noisy sampled transitions rather than requiring samples from the full Bellman equation, while providing the first known max-norm, finite sample performance guarantees for ALP under mild assumptions. Additionally NP-ALP is amenable to problems with large (multidimensional) or even infinite (continuous) action spaces, and does not require a model to select actions using the resulting approximate solution.Item Open Access Object Discovery with a Mobile Robot(2013) Mason, JulianThe world is full of objects: cups, phones, computers, books, and
countless other things. For many tasks, robots need to understand that
this object is a stapler, that object is a textbook, and this other
object is a gallon of milk. The classic approach to this problem is
object recognition, which classifies each observation into one of
several previously-defined classes. While modern object recognition
algorithms perform well, they require extensive supervised training:
in a standard benchmark, the training data average more than four
hundred images of each object class.
The cost of manually labeling the training data prohibits these
techniques from scaling to general environments. Homes and workplaces
can contain hundreds of unique objects, and the objects in one
environment may not appear in another.
We propose a different approach: object discovery. Rather than rely on
manual labeling, we describe unsupervised algorithms that leverage the
unique capabilities of a mobile robot to discover the objects (and
classes of objects) in an environment. Because our algorithms are
unsupervised, they scale gracefully to large, general environments
over long periods of time. To validate our results, we collected 67
robotic runs through a large office environment. This dataset, which
we have made available to the community, is the largest of its kind.
At each step, we treat the problem as one of robotics, not disembodied
computer vision. The scale and quality of our results demonstrate the
merit of this perspective, and prove the practicality of long-term
large-scale object discovery.
Item Open Access PAC-optimal, Non-parametric Algorithms and Bounds for Exploration in Concurrent MDPs with Delayed Updates(2015) Pazis, JasonAs the reinforcement learning community has shifted its focus from heuristic methods to methods that have performance guarantees, PAC-optimal exploration algorithms have received significant attention. Unfortunately, the majority of current PAC-optimal exploration algorithms are inapplicable in realistic scenarios: 1) They scale poorly to domains of realistic size. 2) They are only applicable to discrete state-action spaces. 3) They assume that experience comes from a single, continuous trajectory. 4) They assume that value function updates are instantaneous. The goal of this work is to bridge the gap between theory and practice, by introducing an efficient and customizable PAC optimal exploration algorithm, that is able to explore in multiple, continuous or discrete state MDPs simultaneously. Our algorithm does not assume that value function updates can be completed instantaneously, and maintains PAC guarantees in realtime environments. Not only do we extend the applicability of PAC optimal exploration algorithms to new, realistic settings, but even when instant value function updates are possible, our bounds present a significant improvement over previous single MDP exploration bounds, and a drastic improvement over previous concurrent PAC bounds. We also present Bellman error MDPs, a new analysis methodology for online and offline reinforcement learning algorithms, and TCE, a new, fine grained metric for the cost of exploration.
Item Open Access Sparse Value Function Approximation for Reinforcement Learning(2013) PainterWakefield, Christopher RobertA key component of many reinforcement learning (RL) algorithms is the approximation of the value function. The design and selection of features for approximation in RL is crucial, and an ongoing area of research. One approach to the problem of feature selection is to apply sparsity-inducing techniques in learning the value function approximation; such sparse methods tend to select relevant features and ignore irrelevant features, thus automating the feature selection process. This dissertation describes three contributions in the area of sparse value function approximation for reinforcement learning.
One method for obtaining sparse linear approximations is the inclusion in the objective function of a penalty on the sum of the absolute values of the approximation weights. This L1 regularization approach was first applied to temporal difference learning in the LARS-inspired, batch learning algorithm LARS-TD. In our first contribution, we define an iterative update equation which has as its fixed point the L1 regularized linear fixed point of LARS-TD. The iterative update gives rise naturally to an online stochastic approximation algorithm. We prove convergence of the online algorithm and show that the L1 regularized linear fixed point is an equilibrium fixed point of the algorithm. We demonstrate the ability of the algorithm to converge to the fixed point, yielding a sparse solution with modestly better performance than unregularized linear temporal difference learning.
Our second contribution extends LARS-TD to integrate policy optimization with sparse value learning. We extend the L1 regularized linear fixed point to include a maximum over policies, defining a new, "greedy" fixed point. The greedy fixed point adds a new invariant to the set which LARS-TD maintains as it traverses its homotopy path, giving rise to a new algorithm integrating sparse value learning and optimization. The new algorithm is demonstrated to be similar in performance with policy iteration using LARS-TD.
Finally, we consider another approach to sparse learning, that of using a simple algorithm that greedily adds new features. Such algorithms have many of the good properties of the L1 regularization methods, while also being extremely efficient and, in some cases, allowing theoretical guarantees on recovery of the true form of a sparse target function from sampled data. We consider variants of orthogonal matching pursuit (OMP) applied to RL. The resulting algorithms are analyzed and compared experimentally with existing L1 regularized approaches. We demonstrate that perhaps the most natural scenario in which one might hope to achieve sparse recovery fails; however, one variant provides promising theoretical guarantees under certain assumptions on the feature dictionary while another variant empirically outperforms prior methods both in approximation accuracy and efficiency on several benchmark problems.
Item Open Access The Weakly Identifying System for Doorway Monitoring(2007-05-10T15:22:30Z) Jenkins, Christopher JamesThe System Architecture for Tracking Individuals (SAFTI) is an indoor person location tracking system designed for use in the field of pervasive computing. SAFTI provides location tracking in environments where cameras are too privacy invasive, where tracking devices are too costly, insecure or inconvenient, and where usability is a high priority. While many location tracking systems satisfy each of these constraints individually, SAFTI satisfies all three constraints simultaneously. Upon entering and exiting SAFTI buildings, users submit identification credentials. Once inside the building, using SAFTI is effortless - simply passing through doorways is sufficient for supplying SAFTI with the information it needs to perform location tracking. An integral part of SAFTI is the Weakly Identifying System for Doorway Monitoring (WISDOM). These instrumented doorways contain a variety of infrared, ultrasonic and pressure sensors that detect the direction of passage and measure each user's body size and shape. We quantify the measurement and identification accuracy of WISDOM by analyzing data collected from a user study containing 530 passes through a WISDOM prototype from 10 different subjects. We combine the results from WISDOM with large publicly available anthropometric databases to evaluate how accurately SAFTI performs location tracking with respect to building size, density of occupants, and matching algorithm used.Item Open Access Transfer Learning in Value-based Methods with Successor Features(2023) Nemecek, Mark WilliamThis dissertation investigates the concept of transfer learning in a reinforcement learning (RL) context. Transfer learning is based on the idea that it is possible for an agent to use what it has learned in one task to improve the learning process in another task as compared to learning from scratch. This improvement can take multiple forms, such as reducing the number of samples required to reach a given level of performance or even increasing the best performance achieved. In particular, we examine properties and applications of successor features, which are a useful representation that allows efficient calculation of action-value functions for a given policy in different contexts.
Our first contribution is a method for incremental construction of a cache of policies for a family of tasks. When a family of tasks share transition dynamics but differ in reward function, successor features allow us to efficiently compute the action-value functions for known policies in new tasks. As the optimal policy for a new task might be the same as or similar to that for a previous task, it is not always necessary for an agent to learn a new policy for each new task it encounters, especially if it is allowed some amount of suboptimality. We present new bounds for the performance of optimal policies in a new task, as well as an approach to use these bounds to decide, when presented with a new task, whether to use cached policies or learn a new policy.
In our second contribution, we examine the problem of hierarchical reinforcement learning, which involves breaking a task down into smaller subtasks which are easier to solve, through the lens of transfer learning. Within a single task, a subtask may encapsulate a behavior which could be used multiple times for completing the task, but occur in different contexts, such as opening doors while navigating a building. When the reward function changes between tasks, a given subtask may be unaffected, i.e., the optimal behavior within that subtask may remain the same. If so, the behavior may be immediately reused to accelerate training of behaviors for other subtasks. In both of these cases, reusing the learned behavior can be viewed as a transfer learning problem. We introduce a method based on the MAXQ value function decomposition which uses two applications of successor features to facilitate both transfer within a task and transfer between tasks with different reward functions.
The final contribution of this dissertation introduces a method for transfer using a value-based approach in domains with continuous actions. When an environment's action space is continuous, finding the action which maximizes an action-value function approximator efficiently often requires defining a constrained approximator which results in suboptimal behavior. Recently the RBF-DQN approach was proposed to use deep radial-basis value functions to allow efficient maximization of an action-value approximator over the actions while not losing the universal approximator property of neural networks. We present a method which extends this approach to use successor features in order to allow for effective transfer learning between tasks which differ in reward function.
Item Open Access Transition Space Distance Learning(2019) Nemecek, Mark WilliamThe notion of distance plays and important role in many reinforcement learning (RL) techniques. This role may be explicit, as in some non-parametric approaches, or it may be implicit in the architecture of the feature space. The ability to learn distance functions tailored for RL tasks could, thus, benefit many different RL paradigms. While several approaches to learning distance functions from data do exist, they are frequently intended for use in clustering or classification tasks and typically do not take into account the inherent structure present in trajectories sampled from RL environments. For those that do, this structure is generally used to define a similarity between states rather than to represent the mechanics of the domain. Based on the idea that a good distance function in such a domain would reflect the number of transitions necessary to get to from one state to another, we detail an approach to learning distance functions which accounts for the nature of state transitions in a Markov decision process, including their inherent directionality. We then present the results of experiments performed in multiple RL environments in order to demonstrate the benefit of learning such distance functions.