Skip to yearly menu bar Skip to main content


Spotlight

Role of Locality and Weight Sharing in Image-Based Tasks: A Sample Complexity Separation between CNNs, LCNs, and FCNs

Aakash Sunil Lahoti · Stefani Karp · Ezra Winston · Aarti Singh · Yuanzhi Li

Abstract: Vision-based tasks are known to exhibit the properties of locality and translation invariance. The superior performance of convolutional neural networks (CNNs) on these tasks is attributed to the inductive bias of locality and weight sharing baked into their architecture. Existing attempts at quantifying the statistical benefits of these biases in CNNs over local convolutional neural networks (LCNs) and fully connected neural networks (FCNs) fall into one of the following categories: either do not establish a gap between the performance of these architectures, or ignore optimization considerations, or consider stylized settings that are not reflective of image-like tasks, particularly translation invariance. We introduce the Dynamic Signal Distribution (DSD), a data model that is designed to capture properties of real-world images such as locality and translation invariance. In DSD, each image is modeled with $k$ patches, with each patch of dimension $d$, and the label is determined by a $d$-sparse signal vector that can freely appear in any one of the $k$ patches. Under this task, we show that CNNs trained using gradient descent require $\tilde{O}(k+d)$ samples, whereas LCNs require $\Omega(kd)$ samples for predicting the label, establishing the statistical advantages of weight sharing in translation invariant tasks. Additionally, LCNs need $\tilde{O}(k(k+d))$ samples, compared to FCNs, which need $\Omega(k^2d)$ samples, showcasing the benefits of locality in local tasks.

Chat is not available.