Oral
in
Affinity Workshop: Tiny Papers Showcase Day (a DEI initiative)
Prune and Tune: Improving Efficient Pruning Techniques for Massive Language Models
Aaquib Syed · Phillip Guo
Abstract:
Massive language models with billions of parameters have significant compute expenses and thus can benefit from pruning. Pruning techniques for massive models are typically iterative and require extensive weight retraining after pruning. SparseGPT, a recently introduced one-shot technique for pruning such models, enables pruning without retraining. We improve upon SparseGPT by fine-tuning during pruning with minimal training steps, and we perform experiments against magnitude pruning and find that our iteratively fine-tuned SparseGPT models significantly outperform their magnitude pruning counterparts at high sparsity.
Chat is not available.