dc.contributor.advisor |
Parr, Ronald |
|
dc.contributor.advisor |
Conitzer, Vincent |
|
dc.contributor.advisor |
Maggioni, Mauro |
|
dc.contributor.advisor |
Ferrari, Silvia |
|
dc.contributor.author |
Pazis, Jason |
|
dc.date.accessioned |
2013-01-16T20:47:45Z |
|
dc.date.issued |
2012 |
|
dc.identifier.uri |
https://hdl.handle.net/10161/6189 |
|
dc.description.abstract |
One of the most difficult tasks in value function based methods for learning in Markov
Decision Processes is finding an approximation architecture that is expressive enough
to capture the important structure in the value function, while at the same time not
overfitting the training samples. This thesis presents a novel Non-Parametric approach
to Approximate Linear Programming (NP-ALP), which requires nothing more than a smoothness
assumption on the value function. NP-ALP can make use of real-world, noisy sampled
transitions rather than requiring samples from the full Bellman equation, while providing
the first known max-norm, finite sample performance guarantees for ALP under mild
assumptions. Additionally NP-ALP is amenable to problems with large (multidimensional)
or even infinite (continuous) action spaces, and does not require a model to select
actions using the resulting approximate solution.
|
|
dc.title |
Non-parametric approximate linear programming for MDPs |
|
dc.type |
Master's thesis |
|
dc.department |
Computer Science |
|
pubs.organisational-group |
Duke |
|
pubs.organisational-group |
Duke |
|
pubs.organisational-group |
Trinity College of Arts & Sciences |
|
pubs.organisational-group |
Duke |
|
pubs.organisational-group |
Trinity College of Arts & Sciences |
|
pubs.organisational-group |
Computer Science |
|
pubs.publication-status |
Published |
|