Global optimality of softmax policy gradient with single hidden layer
  neural networks in the mean-field regime

Agazzi, Andrea; Lu, Jianfeng

Global optimality of softmax policy gradient with single hidden layer neural networks in the mean-field regime

View / Download438.11 KB

Authors

Agazzi, Andrea

Lu, Jianfeng

Repository Usage Stats

93
views

33
downloads

Abstract

We study the problem of policy optimization for infinite-horizon discounted Markov Decision Processes with softmax policy and nonlinear function approximation trained with policy gradient algorithms. We concentrate on the training dynamics in the mean-field regime, modeling e.g., the behavior of wide single hidden layer neural networks, when exploration is encouraged through entropy regularization. The dynamics of these models is established as a Wasserstein gradient flow of distributions in parameter space. We further prove global optimality of the fixed points of this dynamics under mild conditions on their initialization.

Type

Journal article

Subjects

cs.LG, cs.LG, stat.ML

Permalink

https://hdl.handle.net/10161/21649

Collections

Scholarly Articles

Full item page

Scholars@Duke

Jianfeng Lu

James B. Duke Distinguished Professor of Mathematics

Jianfeng Lu is an applied mathematician interested in mathematical analysis and algorithm development for problems from computational physics, theoretical chemistry, materials science, machine learning, and other related fields.

More specifically, his current research focuses include:
High dimensional PDEs; generative models and sampling methods; control and reinforcement learning; electronic structure and many body problems; quantum molecular dynamics; multiscale modeling and analysis.

Unless otherwise indicated, scholarly articles published by Duke faculty members are made available here with a CC-BY-NC (Creative Commons Attribution Non-Commercial) license, as enabled by the Duke Open Access Policy. If you wish to use the materials in ways not already permitted under CC-BY-NC, please consult the copyright owner. Other materials are made available here through the author’s grant of a non-exclusive license to make their work openly accessible.

Global optimality of softmax policy gradient with single hidden layer neural networks in the mean-field regime

Authors

Journal Title

Journal ISSN

Volume Title

Repository Usage Stats

Abstract

Type

Department

Description

Provenance

Subjects

Citation

Permalink

Collections

Scholars@Duke

Jianfeng Lu