Show simple item record

PAC-optimal, Non-parametric Algorithms and Bounds for Exploration in Concurrent MDPs with Delayed Updates

dc.contributor.advisor Parr, Ronald
dc.contributor.author Pazis, Jason
dc.date.accessioned 2016-01-04T19:26:00Z
dc.date.available 2016-01-04T19:26:00Z
dc.date.issued 2015
dc.identifier.uri https://hdl.handle.net/10161/11334
dc.description.abstract <p>As the reinforcement learning community has shifted its focus from heuristic methods to methods that have performance guarantees, PAC-optimal exploration algorithms have received significant attention. Unfortunately, the majority of current PAC-optimal exploration algorithms are inapplicable in realistic scenarios: 1) They scale poorly to domains of realistic size. 2) They are only applicable to discrete state-action spaces. 3) They assume that experience comes from a single, continuous trajectory. 4) They assume that value function updates are instantaneous. The goal of this work is to bridge the gap between theory and practice, by introducing an efficient and customizable PAC optimal exploration algorithm, that is able to explore in multiple, continuous or discrete state MDPs simultaneously. Our algorithm does not assume that value function updates can be completed instantaneously, and maintains PAC guarantees in realtime environments. Not only do we extend the applicability of PAC optimal exploration algorithms to new, realistic settings, but even when instant value function updates are possible, our bounds present a significant improvement over previous single MDP exploration bounds, and a drastic improvement over previous concurrent PAC bounds. We also present Bellman error MDPs, a new analysis methodology for online and offline reinforcement learning algorithms, and TCE, a new, fine grained metric for the cost of exploration.</p>
dc.subject Computer science
dc.subject Artificial intelligence
dc.subject Concurrent
dc.subject Delay
dc.subject Exploration
dc.subject MDP
dc.subject PAC-optimal
dc.subject Reinforcement Learning
dc.title PAC-optimal, Non-parametric Algorithms and Bounds for Exploration in Concurrent MDPs with Delayed Updates
dc.type Dissertation
dc.department Computer Science


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record