ICLR Poster Revisiting the Last-Iterative Convergence of Stochastic Gradient Methods

Poster

Revisiting the Last-Iterative Convergence of Stochastic Gradient Methods

Zijian Liu · Zhengyuan Zhou

Halle B

[ Abstract ]

[ OpenReview]

Abstract: In the past several years, the convergence of the last iterate of the Stochastic Gradient Descent (SGD) algorithm has triggered people's great interest due to its good performance in practice but lack of theoretical understanding. For Lipschtiz and convex functions, different works have established the optimal $O(\log(1/\delta)\log T/\sqrt{T})$ or $O(\sqrt{\log(1/\delta)/T})$ high-probability convergence rates for the final iterate, where $T$ is the time horizon and $\delta$ is the failure probability. However, to prove these bounds, all the existing works are limited to compact domains, and almost all of them also require almost surely bounded noises. It is natural to ask whether the last iterate of SGD can still guarantee the optimal convergence rate but without these two restrictive assumptions. Besides this important question, there are still lots of theoretical problems lacking an answer. For example, compared with the last iterate convergence of SGD for non-smooth problems, only very few results for smooth optimization have yet been developed. Additionally, the existing results are all limited to a single objective and the standard Euclidean norm. It still remains unclear whether the last-iterative convergence can be provably extended to wider composite optimization and non-Euclidean norms. In this work, to address the issues mentioned above, we revisit the last-iterative convergence of stochastic gradient methods and provide the first unified way to prove the convergence rates both in expectation and in high probability to accommodate general domains, composite objectives, non-Euclidean norms, Lipschitz conditions, smoothness and (strong) convexity simultaneously.

Chat is not available.