Global optimality of softmax policy gradient with single hidden layer neural networks in the mean-field regime
Abstract
We study the problem of policy optimization for infinite-horizon discounted Markov Decision Processes with softmax policy and nonlinear function approximation trained with policy gradient algorithms. We concentrate on the training dynamics in the mean-field regime, modeling e.g., the behavior of wide single hidden layer neural networks, when exploration is encouraged through entropy regularization. The dynamics of these models is established as a Wasserstein gradient flow of distributions in parameter space. We further prove global optimality of the fixed points of this dynamics under mild conditions on their initialization.
Type
Department
Description
Provenance
Citation
Permalink
Collections
Scholars@Duke
Jianfeng Lu
Jianfeng Lu is an applied mathematician interested in mathematical analysis and algorithm development for problems from computational physics, theoretical chemistry, materials science, machine learning, and other related fields.
More specifically, his current research focuses include:
High dimensional PDEs; generative models and sampling methods; control and reinforcement learning; electronic structure and many body problems; quantum molecular dynamics; multiscale modeling and analysis.
Unless otherwise indicated, scholarly articles published by Duke faculty members are made available here with a CC-BY-NC (Creative Commons Attribution Non-Commercial) license, as enabled by the Duke Open Access Policy. If you wish to use the materials in ways not already permitted under CC-BY-NC, please consult the copyright owner. Other materials are made available here through the author’s grant of a non-exclusive license to make their work openly accessible.