In-Person Poster presentation / poster accept
Backstepping Temporal Difference Learning
Han-Dong Lim · Donghwan Lee
MH1-2-3-4 #101
Keywords: [ policy evaluation ] [ temporal difference learning ] [ reinforcement learning ] [ Reinforcement Learning ]
Off-policy learning ability is an important feature of reinforcement learning (RL) for practical applications. However, even one of the most elementary RL algorithms, temporal-difference (TD) learning, is known to suffer form divergence issue when the off-policy scheme is used together with linear function approximation. To overcome the divergent behavior, several off-policy TD learning algorithms have been developed until now. In this work, we provide a unified view of such algorithms from a purely control-theoretic perspective. Our method relies on the backstepping technique, which is widely used in nonlinear control theory.