Spotlight Poster
Learning the greatest common divisor: explaining transformer predictions
François Charton
Halle B
Abstract:
We train small transformers to calculate the greatest common divisor (GCD) of two positive integers, and show that their predictions are fully explainable. During training, models learn a list $\mathcal D$ of divisors, and predict the largest element of $\mathcal D$ that divides both inputs. We also show that training distributions have a large impact on performance. Models trained from uniform operands only learn a handful of GCD (up to $38$ out of $100$). Training from log-uniform operands boosts performance to $73$ correct GCD, and balancing the distribution of GCD, from inverse square to log-uniform, to $91$. On the other hand, a uniform distribution of GCD in the training set breaks model explainability.
Chat is not available.