Oral
in
Workshop: Workshop on robustness of zero/few-shot learning in foundation models (R0-FoMo)
Teaching language models with canonical examples
John Hewitt · Sarah Chen · Percy Liang · Christopher D Manning
It is easy to write a desirable or undesirable language model behavior (e.g., knowledge---The capital of Mauritius is Port Louis---or undesirable stereotypes---Researchers are always coldhearted) but it is difficult to make the model robustly generalize from these canonical examples. We formalize this task: a learning method takes a model and simple canonical examples and must produce a model that (1) generalizes to naturalistic examples, (2) stays within a bound of the original model's loss, and (3) performs well on a ``hard negative'' distribution to test overgeneralization. We build on the Backpack language model; its predictions take the form of a sparse weighted sum over a very large sense vector bank. We select and finetune a few Backpack senses per canonical example and find that this substantially outperforms other training methods. The Backpack we work with is only 170m parameters; yet, we find that it can improve much larger models: a product-of-experts ensemble between the 35x larger GPT-J-6B and the ratio of finetuned to pretrained Backpack outperforms finetuning GPT-J itself.