Global optimality of softmax policy gradient with single hidden layer neural networks in the mean-field regime

dc.contributor.author

Agazzi, Andrea

dc.contributor.author

Lu, Jianfeng

dc.date.accessioned

2020-11-01T14:22:18Z

dc.date.available

2020-11-01T14:22:18Z

dc.date.updated

2020-11-01T14:22:17Z

dc.description.abstract

We study the problem of policy optimization for infinite-horizon discounted Markov Decision Processes with softmax policy and nonlinear function approximation trained with policy gradient algorithms. We concentrate on the training dynamics in the mean-field regime, modeling e.g., the behavior of wide single hidden layer neural networks, when exploration is encouraged through entropy regularization. The dynamics of these models is established as a Wasserstein gradient flow of distributions in parameter space. We further prove global optimality of the fixed points of this dynamics under mild conditions on their initialization.

dc.identifier.uri

https://hdl.handle.net/10161/21649

dc.subject

cs.LG

dc.subject

cs.LG

dc.subject

stat.ML

dc.title

Global optimality of softmax policy gradient with single hidden layer neural networks in the mean-field regime

dc.type

Journal article

duke.contributor.orcid

Lu, Jianfeng|0000-0001-6255-5165

pubs.organisational-group

Trinity College of Arts & Sciences

pubs.organisational-group

Chemistry

pubs.organisational-group

Mathematics

pubs.organisational-group

Physics

pubs.organisational-group

Duke

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
2010.11858v1.pdf
Size:
438.11 KB
Format:
Adobe Portable Document Format