Global optimality of softmax policy gradient with single hidden layer
  neural networks in the mean-field regime

dc.contributor.author	Agazzi, Andrea
dc.contributor.author	Lu, Jianfeng
dc.date.accessioned	2020-11-01T14:22:18Z
dc.date.available	2020-11-01T14:22:18Z
dc.date.updated	2020-11-01T14:22:17Z
dc.description.abstract	We study the problem of policy optimization for infinite-horizon discounted Markov Decision Processes with softmax policy and nonlinear function approximation trained with policy gradient algorithms. We concentrate on the training dynamics in the mean-field regime, modeling e.g., the behavior of wide single hidden layer neural networks, when exploration is encouraged through entropy regularization. The dynamics of these models is established as a Wasserstein gradient flow of distributions in parameter space. We further prove global optimality of the fixed points of this dynamics under mild conditions on their initialization.
dc.identifier.uri	https://hdl.handle.net/10161/21649
dc.subject	cs.LG
dc.subject	cs.LG
dc.subject	stat.ML
dc.title	Global optimality of softmax policy gradient with single hidden layer neural networks in the mean-field regime
dc.type	Journal article
duke.contributor.orcid	Lu, Jianfeng\|0000-0001-6255-5165
pubs.organisational-group	Trinity College of Arts & Sciences
pubs.organisational-group	Chemistry
pubs.organisational-group	Mathematics
pubs.organisational-group	Physics
pubs.organisational-group	Duke

Files

Now showing 1 - 1 of 1