Poster
in
Workshop: Instruction Tuning and Instruction Following
Self-RAG: Self-reflective Retrieval Augmented Generation
Akari Asai · Zeqiu Wu · Yizhong Wang · Avi Sil · Hannaneh Hajishirzi
Keywords: [ factuality ] [ Retrieval-augmented Language Models ] [ language models ] [ Retrieval Augmentation ]
Scaling up language models (LMs) or instruction tuning has shown limited effects on improving factuality of LM outputs. Retrieval-Augmented Generation (RAG), an ad hoc approach that augments Language Models (LMs) with retrieval, decreases hallucination issues of large LMs. However, indiscriminately retrieving and incorporating a fixed number of retrieved passages, regardless of whether retrieval is necessary, or passages are relevant, diminishes instruction-following LM versatility or can lead to unhelpful response generation. In this work, we introduce a new framework called Self-Reflective Retrieval-Augmented Generation (Self-RAG) that enhances an LM's quality and factuality through retrieval and self-reflection. Our framework trains a single arbitrary LM to learn to adaptively retrieve passages on-demand, and generate and reflect on retrieved passages and its own generations using special tokens, called reflection tokens, on diverse instruction-tuning data with interleaving retrieved passages and reflection tokens. Generating reflection tokens makes the LM controllable during the inference phase, enabling it to tailor its behavior to diverse task requirements. Experiments show that Self-RAG (7B and 13B parameters) significantly outperforms state-of-the-art pre-trained and instruction-follwing LLMs and retrieval-augmented models on a diverse set of tasks. Specifically, Self-RAG outperforms ChatGPT and retrieval-augmented Llama2-chat on Open-domain QA, fact verification and reasoning tasks, and it shows significant gains in factuality scores and citation accuracy for long-form generations relative to these models.