Poster
in
Workshop: Mathematical and Empirical Understanding of Foundation Models (ME-FoMo)
Towards Understanding Chain-of-Thought Prompting: An Empirical Study of What Matters
Boshi Wang · Sewon Min · Xiang Deng · Jiaming Shen · You Wu · Luke Zettlemoyer · Huan Sun
Keywords: [ Chain-of-Thought prompting ] [ multi-step reasoning ] [ in-context learning ] [ large language models ]
Chain-of-Thought (CoT) prompting, which encourages language models (LMs) to generate intermediate rationales for the final answer through in-context demonstrations, dramatically improves large LMs' ability to solve reasoning tasks. Despite its success, there is little understanding on what makes CoT prompting effective and which aspects of the demonstrated reasoning steps contribute to its performance. In this paper, we show that prompting with invalid demonstrations affects little in CoT reasoning, achieving over 80-90% of the performance obtained using the original CoT under various metrics, while still generating coherent lines of reasoning during inference. Further experiments show that other aspects of the rationales, such as being relevant to the query and correctly ordering the reasoning steps, are the actual key to the effectiveness of CoT. Overall, these findings deepen our understanding of CoT prompting, while leading to new questions regarding large LMs’ capability to learn to reason in context and reflections on benchmarking few-shot reasoning.