Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Gamification and Multiagent Solutions

Stackelberg Policy Gradient: Evaluating the Performance of Leaders and Followers

Quoc-Liem Vu · Zane Alumbaugh · Ryan Ching · Quanchen Ding · Arnav Mahajan · Benjamin Chasnov · Sam Burden · Lillian J Ratliff


Abstract:

Hierarchical order of play is an important concept for reinforcement learning to understand better the decisions made by strategic agents in a shared environment. In this paper, we compare the learning dynamics between Stackelberg and simultaneous reinforcement learning agents. Agents are trained using their policy gradient and are tested against each other in a tournament. We compare agent performance in zero-sum and non-zero-sum Markov games. We show that the Stackelberg leader performs better in training under the same parameters. However, under the same parameters in the tournament setting, Stackelberg leaders and followers performed similarly to the simultaneous player. Analytically, hierarchical training can potentially provide stronger guarantees for policy gradient.

Chat is not available.