Global optimality of softmax policy gradient with single hidden layer neural networks in the mean-field regime

Loading...
Thumbnail Image

Journal Title

Journal ISSN

Volume Title

Repository Usage Stats

93
views
28
downloads

Abstract

We study the problem of policy optimization for infinite-horizon discounted Markov Decision Processes with softmax policy and nonlinear function approximation trained with policy gradient algorithms. We concentrate on the training dynamics in the mean-field regime, modeling e.g., the behavior of wide single hidden layer neural networks, when exploration is encouraged through entropy regularization. The dynamics of these models is established as a Wasserstein gradient flow of distributions in parameter space. We further prove global optimality of the fixed points of this dynamics under mild conditions on their initialization.

Department

Description

Provenance

Citation

Scholars@Duke

Lu

Jianfeng Lu

Professor of Mathematics

Jianfeng Lu is an applied mathematician interested in mathematical analysis and algorithm development for problems from computational physics, theoretical chemistry, materials science, machine learning, and other related fields.

More specifically, his current research focuses include:
High dimensional PDEs; generative models and sampling methods; control and reinforcement learning; electronic structure and many body problems; quantum molecular dynamics; multiscale modeling and analysis.


Unless otherwise indicated, scholarly articles published by Duke faculty members are made available here with a CC-BY-NC (Creative Commons Attribution Non-Commercial) license, as enabled by the Duke Open Access Policy. If you wish to use the materials in ways not already permitted under CC-BY-NC, please consult the copyright owner. Other materials are made available here through the author’s grant of a non-exclusive license to make their work openly accessible.