Skip to yearly menu bar Skip to main content


Oral

LLMCarbon: Modeling the End-to-End Carbon Footprint of Large Language Models

Ahmad Faiz · Sotaro Kaneda · Ruhan Wang · Rita Osi · Prateek Sharma · Fan Chen · Lei Jiang

[ ] [ Visit Oral 6B ]
Thu 9 May 7:15 a.m. — 7:30 a.m. PDT

Abstract:

The carbon footprint of large language models (LLMs) is substantial, stemming from their training, inference, experimentation, and storage processes, encompassing both operational and embodied carbon emissions. Precisely assessing the carbon impact of emerging LLMs before their actual training, which involves substantial GPU usage, is crucial. Although many previous studies have reported the carbon footprint of LLM training, only one prior tool, mlco2, can predict the carbon footprint of new neural networks before their physical training. However, mlco2 exhibits several limitations. Firstly, it cannot extend its carbon footprint estimation to include dense or mixture-of-experts (MoE) LLMs. Secondly, mlco2 disregards essential architectural parameters of networks, such as parameter counts, leading to inflated projections. Thirdly, mlco2 focuses solely on GPUs, excluding TPUs and assuming uniform peak computing throughput across GPUs, resulting in imprecise carbon footprint estimations. Lastly, mlco2 cannot model the embodied carbon footprint of an LLM. To address these gaps, we present an end-to-end carbon footprint projection model, LLMCarbon, designed for both dense and MoE LLMs. Compared to mlco2, LLMCarbon greatly improves the estimation accuracy of the carbon footprint of various LLMs.

Chat is not available.