Global Convergence of Localized Policy Iteration in Networked Multi-Agent Reinforcement Learning
Date
2023-02-28
Journal Title
Journal ISSN
Volume Title
Repository Usage Stats
views
downloads
Citation Stats
Abstract
We study a multi-agent reinforcement learning (MARL) problem where the agents interact over a given network. The goal of the agents is to cooperatively maximize the average of their entropy-regularized long-term rewards. To overcome the curse of dimensionality and to reduce communication, we propose a Localized Policy Iteration (LPI) algorithm that provably learns a near-globally-optimal policy using only local information. In particular, we show that, despite restricting each agent's attention to only its κ-hop neighborhood, the agents are able to learn a policy with an optimality gap that decays polynomially in κ. In addition, we show the finite-sample convergence of LPI to the global optimal policy, which explicitly captures the trade-off between optimality and computational complexity in choosing κ. Numerical simulations demonstrate the effectiveness of LPI.
Type
Department
Description
Provenance
Citation
Permalink
Published Version (Please cite this version)
Publication Info
Zhang, Y, G Qu, P Xu, Y Lin, Z Chen and A Wierman (2023). Global Convergence of Localized Policy Iteration in Networked Multi-Agent Reinforcement Learning. Proceedings of the ACM on Measurement and Analysis of Computing Systems, 7(1). pp. 1–51. 10.1145/3579443 Retrieved from https://hdl.handle.net/10161/30775.
This is constructed from limited available data and may be imprecise. To cite this article, please review & use the official citation provided by the journal.
Collections
Scholars@Duke
Pan Xu
My research is centered around Machine Learning, with broad interests in the areas of Artificial Intelligence, Data Science, Optimization, Reinforcement Learning, High Dimensional Statistics, and their applications to real-world problems including Bioinformatics and Healthcare. My research goal is to develop computationally- and data-efficient machine learning algorithms with both strong empirical performance and theoretical guarantees.
Unless otherwise indicated, scholarly articles published by Duke faculty members are made available here with a CC-BY-NC (Creative Commons Attribution Non-Commercial) license, as enabled by the Duke Open Access Policy. If you wish to use the materials in ways not already permitted under CC-BY-NC, please consult the copyright owner. Other materials are made available here through the author’s grant of a non-exclusive license to make their work openly accessible.