Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Gamification and Multiagent Solutions

Staged independent learning: Towards decentralized cooperative multi-agent Reinforcement Learning

Hadi Nekoei · Akilesh Badrinaaraayanan · Amit Sinha · Mohammad Amini · Janarthanan Rajendran · Aditya Mahajan · Sarath Chandar


Abstract:

We empirically show that classic ideas from two-time scale stochastic approximation \citep{borkar1997stochastic} can be extended to complex cooperative multi-agent reinforcement learning (MARL) problems. We first start with giving a multi-agent estimation problem as a motivating example where staged best response iteration converges while parallel best response iteration does not. Then we present a general implementation of staged multi-agent RL algorithms based on multi-time scale stochastic approximation, and show that our new method called Staged Independent Proximal Policy Optimization (SIPPO) outperforms state-of-the-art independent learning on almost all the tasks in epymarl \citep{papoudakis2020benchmarking} benchmark. This can be seen as a first step towards more decentralized MARL methods based on multi-time scale learning principle.

Chat is not available.