Global optimality of softmax policy gradient with single hidden layer neural networks in the mean-field regime
dc.contributor.author | Agazzi, Andrea | |
dc.contributor.author | Lu, Jianfeng | |
dc.date.accessioned | 2020-11-01T14:22:18Z | |
dc.date.available | 2020-11-01T14:22:18Z | |
dc.date.updated | 2020-11-01T14:22:17Z | |
dc.description.abstract | We study the problem of policy optimization for infinite-horizon discounted Markov Decision Processes with softmax policy and nonlinear function approximation trained with policy gradient algorithms. We concentrate on the training dynamics in the mean-field regime, modeling e.g., the behavior of wide single hidden layer neural networks, when exploration is encouraged through entropy regularization. The dynamics of these models is established as a Wasserstein gradient flow of distributions in parameter space. We further prove global optimality of the fixed points of this dynamics under mild conditions on their initialization. | |
dc.identifier.uri | ||
dc.subject | cs.LG | |
dc.subject | cs.LG | |
dc.subject | stat.ML | |
dc.title | Global optimality of softmax policy gradient with single hidden layer neural networks in the mean-field regime | |
dc.type | Journal article | |
duke.contributor.orcid | Lu, Jianfeng|0000-0001-6255-5165 | |
pubs.organisational-group | Trinity College of Arts & Sciences | |
pubs.organisational-group | Chemistry | |
pubs.organisational-group | Mathematics | |
pubs.organisational-group | Physics | |
pubs.organisational-group | Duke |