PAC-optimal, Non-parametric Algorithms and Bounds for Exploration in Concurrent MDPs with Delayed Updates

dc.contributor.advisor

Parr, Ronald

dc.contributor.author

Pazis, Jason

dc.date.accessioned

2016-01-04T19:26:00Z

dc.date.available

2016-01-04T19:26:00Z

dc.date.issued

2015

dc.department

Computer Science

dc.description.abstract

As the reinforcement learning community has shifted its focus from heuristic methods to methods that have performance guarantees, PAC-optimal exploration algorithms have received significant attention. Unfortunately, the majority of current PAC-optimal exploration algorithms are inapplicable in realistic scenarios: 1) They scale poorly to domains of realistic size. 2) They are only applicable to discrete state-action spaces. 3) They assume that experience comes from a single, continuous trajectory. 4) They assume that value function updates are instantaneous. The goal of this work is to bridge the gap between theory and practice, by introducing an efficient and customizable PAC optimal exploration algorithm, that is able to explore in multiple, continuous or discrete state MDPs simultaneously. Our algorithm does not assume that value function updates can be completed instantaneously, and maintains PAC guarantees in realtime environments. Not only do we extend the applicability of PAC optimal exploration algorithms to new, realistic settings, but even when instant value function updates are possible, our bounds present a significant improvement over previous single MDP exploration bounds, and a drastic improvement over previous concurrent PAC bounds. We also present Bellman error MDPs, a new analysis methodology for online and offline reinforcement learning algorithms, and TCE, a new, fine grained metric for the cost of exploration.

dc.identifier.uri

https://hdl.handle.net/10161/11334

dc.subject

Computer science

dc.subject

Artificial intelligence

dc.subject

Concurrent

dc.subject

Delay

dc.subject

Exploration

dc.subject

MDP

dc.subject

PAC-optimal

dc.subject

Reinforcement learning

dc.title

PAC-optimal, Non-parametric Algorithms and Bounds for Exploration in Concurrent MDPs with Delayed Updates

dc.type

Dissertation

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Pazis_duke_0066D_13168.pdf
Size:
693.76 KB
Format:
Adobe Portable Document Format

Collections