PAC-optimal, Non-parametric Algorithms and Bounds for Exploration in Concurrent MDPs with Delayed Updates
Date
2015
Authors
Advisors
Journal Title
Journal ISSN
Volume Title
Repository Usage Stats
views
downloads
Abstract
As the reinforcement learning community has shifted its focus from heuristic methods to methods that have performance guarantees, PAC-optimal exploration algorithms have received significant attention. Unfortunately, the majority of current PAC-optimal exploration algorithms are inapplicable in realistic scenarios: 1) They scale poorly to domains of realistic size. 2) They are only applicable to discrete state-action spaces. 3) They assume that experience comes from a single, continuous trajectory. 4) They assume that value function updates are instantaneous. The goal of this work is to bridge the gap between theory and practice, by introducing an efficient and customizable PAC optimal exploration algorithm, that is able to explore in multiple, continuous or discrete state MDPs simultaneously. Our algorithm does not assume that value function updates can be completed instantaneously, and maintains PAC guarantees in realtime environments. Not only do we extend the applicability of PAC optimal exploration algorithms to new, realistic settings, but even when instant value function updates are possible, our bounds present a significant improvement over previous single MDP exploration bounds, and a drastic improvement over previous concurrent PAC bounds. We also present Bellman error MDPs, a new analysis methodology for online and offline reinforcement learning algorithms, and TCE, a new, fine grained metric for the cost of exploration.
Type
Department
Description
Provenance
Citation
Permalink
Citation
Pazis, Jason (2015). PAC-optimal, Non-parametric Algorithms and Bounds for Exploration in Concurrent MDPs with Delayed Updates. Dissertation, Duke University. Retrieved from https://hdl.handle.net/10161/11334.
Collections
Except where otherwise noted, student scholarship that was shared on DukeSpace after 2009 is made available to the public under a Creative Commons Attribution / Non-commercial / No derivatives (CC-BY-NC-ND) license. All rights in student work shared on DukeSpace before 2009 remain with the author and/or their designee, whose permission may be required for reuse.