Poster
in
Workshop: Physics for Machine Learning
THE RL PERCEPTRON: DYNAMICS OF POLICY LEARNING IN HIGH DIMENSIONS
Nishil Patel · Sebastian Lee · Stefano Mannelli · Sebastian Goldt · Andrew Saxe
Reinforcement learning (RL) algorithms have proven transformative in a range ofdomains. To tackle real-world domains, these systems often use neural networksto learn policies directly from pixels or other high-dimensional sensory input. Bycontrast, much theory of RL has focused on discrete state spaces or worst caseanalyses, and fundamental questions remain about the dynamics of policy learningin high dimensional settings. Here we propose a simple high-dimensional modelof RL and derive its typical dynamics as a set of closed-form ODEs. We show thatthe model exhibits rich behavior including delayed learning under sparse rewards;a speed-accuracy trade-off depending on reward stringency; and a dependenceof learning regime on reward baselines. These results offer a first step towardunderstanding policy gradient methods in high dimensional settings.