Non-parametric approximate linear programming for MDPs

One of the most difficult tasks in value function based methods for learning in Markov Decision Processes is finding an approximation architecture that is expressive enough to capture the important structure in the value function, while at the same time not overfitting the training samples. This thesis presents a novel Non-Parametric approach to Approximate Linear Programming (NP-ALP), which requires nothing more than a smoothness assumption on the value function. NP-ALP can make use of real-world, noisy sampled transitions rather than requiring samples from the full Bellman equation, while providing the first known max-norm, finite sample performance guarantees for ALP under mild assumptions. Additionally NP-ALP is amenable to problems with large (multidimensional) or even infinite (continuous) action spaces, and does not require a model to select actions using the resulting approximate solution.

Type

Master's thesis

Department

Computer Science

Permalink

https://hdl.handle.net/10161/6189

Citation

Pazis, Jason (2012). Non-parametric approximate linear programming for MDPs. Master's thesis, Duke University. Retrieved from https://hdl.handle.net/10161/6189.

Collections

Masters Theses

Full item page

Dukes student scholarship is made available to the public using a Creative Commons Attribution / Non-commercial / No derivative (CC-BY-NC-ND) license.

Non-parametric approximate linear programming for MDPs

Date

Authors

Advisors

Journal Title

Journal ISSN

Volume Title

Repository Usage Stats

Abstract

Type

Department

Description

Provenance

Subjects

Citation

Permalink

Citation

Collections