Virtual presentation / poster accept
Decision S4: Efficient Sequence-Based RL via State Spaces Layers
Shmuel Bar David · Itamar Zimerman · Eliya Nachmani · Lior Wolf
Keywords: [ S4 ] [ Sequential RL ] [ Decision transformers ] [ Reinforcement Learning ]
Recently, sequence learning methods have been applied to the problem of off-policyReinforcement Learning, including the seminal work on Decision Transformers,which employs transformers for this task. Since transformers are parameter-heavy,cannot benefit from history longer than a fixed window size, and are not computedusing recurrence, we set out to investigate the suitability of the S4 family ofmodels, which are based on state-space layers and have been shown to outperformtransformers, especially in modeling long-range dependencies. In this work, wepresent two main algorithms: (i) an off-policy training procedure that works withtrajectories, while still maintaining the training efficiency of the S4 model. (ii) Anon-policy training procedure that is trained in a recurrent manner, benefits fromlong-range dependencies, and is based on a novel stable actor-critic mechanism.Our results indicate that our method outperforms multiple variants of decisiontransformers, as well as the other baseline methods on most tasks, while reducingthe latency, number of parameters, and training time by several orders of magnitude,making our approach more suitable for real-world RL