Poster
in
Workshop: Mathematical and Empirical Understanding of Foundation Models (ME-FoMo)
Z-ICL: Zero-Shot In-Context Learning with Pseudo-Demonstrations
Xinxi Lyu · Sewon Min · Iz Beltagy · Luke Zettlemoyer · Hannaneh Hajishirzi
Keywords: [ Retrieval ] [ in-context learning ] [ nlp ] [ zero-shot learning ] [ prompting ] [ large language model ]
Language models (LMs) perform a new task at test time either through zero-shot inference or few-shot in-context learning, i.e., conditioning on the k-shot training data (so-called demonstrations). Prior work suggests that in-context learning mainly activates the intrinsic ability of the LM. We argue that this implies zero-shot performance of the LM is underestimated and can be as good as in-context learning if we inform the LM with the correct space of the inputs and the labels using pseudo-demonstrations. We also identify an additional factor which we call the copying effect: if pseudo-demonstrations includes an input that is very similar to the test input, the model prediction is heavily influenced by the paired label of that input. Putting altogether, we introduce Z-ICL, a new zero-shot prompting method that constructs pseudo-demonstrations without any training data that (a) informs the correct space of the inputs and the outputs and (b) reduces the copying effect so that the prediction is less affected by the pairings in the pseudo-demonstration. Z-ICL includes (a) leveraging nearest neighbors from a raw text corpus and pairing them with random but valid labels and (b) proposing a set of techniques such as physical neighbors and synonym labeling. Z-ICL outperforms previous zero-shot methods by a significant margin, and is on par with in-context learning with gold training data on a range of text classification datasets. Together, Z-ICL provides a significantly higher estimate of the model’s ability to perform a new task zero-shot, and poses a set of new questions about the capacities of LMs.