Poster
Anisotropy helps: improved statistical and computational complexity of the mean-field Langevin dynamics under structured data
Atsushi Nitanda · Kazusato Oko · Taiji Suzuki · Denny Wu
Halle B
Recent works have shown that neural networks optimized by gradient-based methods can adapt to sparse or low-dimensional target functions through feature learning; an often studied target is the sparse parity function defined on the unit hypercube. However, such isotropic data setting does not capture the anisotropy and low intrinsic dimensionality exhibited in realistic datasets. In this work, we address this shortcoming by studying how gradient-based feature learning interacts with structured (anisotropic) input data: we consider the sparse parity problem on high-dimensional orthotope where the feature coordinates have varying magnitudes, and analyze the learning complexity of the mean-field Langevin dynamics (MFLD), which describes the noisy gradient descent update on two-layer neural network. We show that the statistical complexity (i.e. sample size) and computational complexity (i.e. width of the neural network) of MFLD can both be improved when prominent directions of the anisotropic input data aligns with the support of the target function. Moreover, by employing an anisotropic weight decay regularization determined by the gradient covariance, the problem can be efficiently learned by a constant-width neural network.