Poster
in
Workshop: Foundation Model Interventions
Linguistic Minimal Pairs Elicit Linguistic Similarity in Large Language Models
Xinyu Zhou · Delong Chen · Samuel Cahyawijaya · Xufeng Duan · Zhenguang Cai
Keywords: [ Large Language Model ] [ Linguistic minimal pairs ]
We introduce a novel analysis that leverages linguistic minimal pairs to probe the internal linguistic representations of Large Language Models (LLMs). By measuring the similarity between LLM activation differences across minimal pairs, we quantify linguistic similarity and gain insight into the linguistic knowledge captured by LLMs. Our large-scale experiments, spanning over 100 LLMs and 150,000 minimal pairs in three languages, reveal that linguistic similarity is more consistent in high-resource languages, influenced by training data composition, and strongly aligned with fine-grained theoretical linguistic categories, but weakly aligned with broader categories. This work demonstrates the potential of minimal pairs as a window into the neural representations of language, shedding light on the relationship between LLMs and linguistic theory.