ICLR Poster Neural Deep Equilibrium Solvers

Poster

Neural Deep Equilibrium Solvers

Shaojie Bai · Vladlen Koltun · Zico Kolter

Keywords: [ deep equilibrium models ] [ deep learning ] [ implicit models ]

[ Abstract ]

[ Visit Poster at Spot A2 in Virtual World ] [ OpenReview]

Abstract: A deep equilibrium (DEQ) model abandons traditional depth by solving for the fixed point of a single nonlinear layer $f_\theta$. This structure enables decoupling the internal structure of the layer (which controls representational capacity) from how the fixed point is actually computed (which impacts inference-time efficiency), which is usually via classic techniques such as Broyden's method or Anderson acceleration. In this paper, we show that one can exploit such decoupling and substantially enhance this fixed point computation using a custom neural solver. Specifically, our solver uses a parameterized network to both guess an initial value of the optimization and perform iterative updates, in a method that generalizes a learnable form of Anderson acceleration and can be trained end-to-end in an unsupervised manner. Such a solution is particularly well suited to the implicit model setting, because inference in these models requires repeatedly solving for a fixed point of the same nonlinear layer for different inputs, a task at which our network excels. Our experiments show that these neural equilibrium solvers are fast to train (only taking an extra 0.9-1.1% over the original DEQ's training time), require few additional parameters (1-3% of the original model size), yet lead to a $2\times$ speedup in DEQ network inference without any degradation in accuracy across numerous domains and tasks.

Chat is not available.