Poster
MogaNet: Multi-order Gated Aggregation Network
Siyuan Li · Zedong Wang · Zicheng Liu · Cheng Tan · Haitao Lin · Di Wu · Zhiyuan Chen · Jiangbin Zheng · Stan Z Li
Halle B
By contextualizing the kernel as globally as possible, Modern ConvNets have shown great potential in computer vision tasks. However, recent progress of \textit{multi-order game-theoretic interaction} in deep neural networks (DNNs) shows that the representation capacity of modern ConvNets has not been well unleashed, where the most expressive interactions have not been effectively encoded with the increased kernel size. To address this challenge, we propose a new family of modern ConvNets, dubbed MogaNet, for discriminative visual representation learning in pure ConvNet-based models, with preferable complexity-performance trade-offs. MogaNet encapsulates conceptually simple yet effective convolutions and gated aggregation into a compact module, where discriminative features are efficiently gathered and contextualized in an adaptive manner. Extensive experiments show that MogaNet exhibits great scalability, impressive efficiency of model parameters, and competitive performance compared to state-of-the-art ViTs and ConvNets on ImageNet and various downstream vision benchmarks, including COCO object detection, ADE20K semantic segmentation, 2D\&3D human pose estimation, and video prediction. Notably, MogaNet hits 80.0\% and 87.8\% accuracy with 5.2M and 181M parameters on ImageNet-1K, outperforming ParC-Net and ConvNeXt-L, while saving 59\% FLOPs and 17M parameters, respectively.