Spotlight Poster
Proximal Policy Gradient Arborescence for Quality Diversity Reinforcement Learning
Sumeet Batra · Bryon Tjanaka · Matthew Fontaine · Aleksei Petrenko · Stefanos Nikolaidis · Gaurav Sukhatme
Halle B
Training generally capable agents that thoroughly explore their environment andlearn new and diverse skills is a long-term goal of robot learning. Quality DiversityReinforcement Learning (QD-RL) is an emerging research area that blends thebest aspects of both fields – Quality Diversity (QD) provides a principled formof exploration and produces collections of behaviorally diverse agents, whileReinforcement Learning (RL) provides a powerful performance improvementoperator enabling generalization across tasks and dynamic environments. ExistingQD-RL approaches have been constrained to sample efficient, deterministic off-policy RL algorithms and/or evolution strategies and struggle with highly stochasticenvironments. In this work, we, for the first time, adapt on-policy RL, specificallyProximal Policy Optimization (PPO), to the Differentiable Quality Diversity (DQD)framework and propose several changes that enable efficient optimization anddiscovery of novel skills on high-dimensional, stochastic robotics tasks. Our newalgorithm, Proximal Policy Gradient Arborescence (PPGA), achieves state-of-the-art results, including a 4x improvement in best reward over baselines on thechallenging humanoid domain.