Poster
in
Workshop: eXplainable AI approaches for debugging and diagnosis
Interpreting Language Models Through Knowledge Graph Extraction
Vinitra Swamy · Angelika Romanou · Martin Jaggi
Transformer-based language models trained on large text corpora have enjoyed immense popularity in the natural language processing (NLP) community and are commonly used as a starting point for downstream NLP tasks. While these models are undeniably useful, it is a challenge to quantify their performance beyond traditional accuracy metrics. We aim to compare BERT-based language models through snapshots of acquired knowledge at sequential stages of the training process. Structured relationships from training corpora may be uncovered through querying a masked language model with probing tasks. In this paper, we present a methodology to unveil a knowledge acquisition timeline by generating knowledge graph extracts from cloze "fill-in-the-blank" statements at various stages of RoBERTa's early training. We extend this analysis to a comparison of pretrained variations of BERT models (DistilBERT, BERT-base, RoBERTa). This work offers a quantitative framework to compare language models through knowledge graph extraction and showcases a part-of-speech analysis to identify the linguistic strengths of each model variant. These analyses allow the opportunity for machine learning practitioners to compare models, diagnose their models' behavioral strengths and weaknesses, and identify new targeted datasets to improve model performance.