Deep learning has had enormous success on perceptual tasks but still struggles in providing a model for inference. To address this gap, we have been developing Memory-Attention-Composition networks (MACnets). The MACnet design provides a strong prior for explicitly iterative reasoning, enabling it to learn explainable, structured reasoning, as well as achieve good generalization from a modest amount of data. The model builds from the great success of existing recurrent cells such as LSTMs: A MacNet is a sequence of a single recurrent Memory, Attention, and Composition (MAC) cell. However, its design imposes structural constraints on the operation of each cell and the interactions between them, incorporating explicit control and soft attention mechanisms. We demonstrate the model’s strength and robustness on the challenging CLEVR dataset for visual reasoning (Johnson et al. 2016), achieving a new state-of-the-art 98.9% accuracy, halving the error rate of the previous best model. More importantly, we show that the new model is more data-efficient, achieving good results from even a modest amount of training data. Joint work with Drew Hudson.