Spotlight
Massively Scalable Inverse Reinforcement Learning for Route Optimization
Matt Barnes · Matthew Abueg · Oliver Lange · Matt Deeds · Jason Trader · Denali Molitor · Markus Wulfmeier · Shawn O'Banion
Optimizing for humans’ latent preferences remains a grand challenge in route recommendation. Prior research has provided increasingly general methods based on inverse reinforcement learning (IRL), yet no approach has successfully addressed planetary-scale routing problems with hundreds of millions of states and demonstration trajectories. In this paper, we introduce scaling techniques based on graph compression, spatial parallelization, and improved initialization conditions inspired by a connection to eigenvector algorithms. We revisit classic algorithms in the routing context, and make the key observation that there exists a trade-off between the use of cheap, deterministic planners and expensive yet robust stochastic policies. This insight is leveraged in Receding Horizon Inverse Planning (RHIP), a new generalization of classic IRL algorithms that provides fine-grained control over performance trade-offs via its planning horizon. Our contributions culminate in a policy that achieves a 16-24% improvement in route quality at a global scale, and to the best of our knowledge, represents the largest published benchmark of IRL algorithms in a real-world setting to date. We conclude by conducting an ablation study of key components, presenting negative results from alternative eigenvalue solvers, and identifying opportunities to further improve scalability via IRL-specific batching strategies.