Skip to yearly menu bar Skip to main content


Poster

BatchPrompt: Accomplish more with less

Jianzhe Lin · Maurice Diesendruck · Liang Du · Robin Abraham

Halle B
[ ]
Fri 10 May 7:30 a.m. PDT — 9:30 a.m. PDT

Abstract:

The ever-increasing token limits of large language models (LLMs) have enabled long context as input. Many LLMs are trained/fine-tuned to perform zero-shot/few-shot inference using instruction-based prompts. Crafting prompts for these LLMs typically requires the user to provide a detailed task description, demonstrations, and single example of context for inference. This regular prompt baseline is referred to as “SinglePrompt” in this paper. However, for NLP tasks where each data point for inference is not necessarily lengthy, the token countfor instructions and few-shot examples in the prompt may be considerably larger than that of the data point, resulting in lower token-resource utilization compared with encoder-based models like fine-tuned BERT. This cost-efficiency issue, affecting inference speed and compute budget, counteracts the many benefits LLMs have to offer. This paper aims to alleviate the preceding problem by batching multiple data points into a single prompt, a prompting strategy we refer to as “BatchPrompt”. This strategy increases the “density” of data points, which in turn leads to improved token utilization. Applying BatchPrompt na ̈ıvely, however, is very challenging due to significant performance degradation, as observed in our experiments. We also noticed varying inference outcomes for the same data points appearing in different positions within a prompt. Based on this observation, to address the quality issue while remain high token-resource utilization, we introduce Batch Permutation and Ensembling (BPE) for BatchPrompt, a simple majority voting way that recovers labeling quality through repeatedly permutating data positions in a batch at the price of more token usage. To counterbalance the additional token usage caused by the voting process, we further propose Self-reflection-guided EArly Stopping (SEAS), which can terminate the voting process early for data points the LLM confidently handles. Our comprehensive experimental evaluation demonstrates that BPE +SEAS can boost the performance of BatchPrompt with a striking margin on a range of popular NLP tasks, including question answering (Boolq), textual entailment (RTE), and duplicate questions identification (QQP). These performances are even competitive with/higher than single-data prompting (SinglePrompt), while BatchPrompt requires much fewer LLM calls and input tokens (For SinglePrompt v.s. BatchPrompt+BPE +SEAS with batch size 32, using just 15.7% the number of LLM calls, Boolq accuracy 90.6% → 90.9% with 27.4% tokens, QQP accuracy 87.2% → 88.4% with 18.6% tokens, RTE accuracy 91.5% → 91.1% with 30.8% tokens). We hope our simple yet effective approach will shed light on the future research of large language models. The code will be released.

Chat is not available.