NeurIPS Linguistic Minimal Pairs Elicit Linguistic Similarity in Large Language Models

Poster
in
Workshop: Foundation Model Interventions

Linguistic Minimal Pairs Elicit Linguistic Similarity in Large Language Models

Xinyu Zhou · Delong Chen · Samuel Cahyawijaya · Xufeng Duan · Zhenguang Cai

Keywords: [ Large Language Model ] [ Linguistic minimal pairs ]

[ Abstract ] [ Project Page ]

[ OpenReview]

Abstract:

We introduce a novel analysis that leverages linguistic minimal pairs to probe the internal linguistic representations of Large Language Models (LLMs). By measuring the similarity between LLM activation differences across minimal pairs, we quantify linguistic similarity and gain insight into the linguistic knowledge captured by LLMs. Our large-scale experiments, spanning over 100 LLMs and 150,000 minimal pairs in three languages, reveal that linguistic similarity is more consistent in high-resource languages, influenced by training data composition, and strongly aligned with fine-grained theoretical linguistic categories, but weakly aligned with broader categories. This work demonstrates the potential of minimal pairs as a window into the neural representations of language, shedding light on the relationship between LLMs and linguistic theory.

Chat is not available.

Poster in Workshop: Foundation Model Interventions

Linguistic Minimal Pairs Elicit Linguistic Similarity in Large Language Models

Xinyu Zhou · Delong Chen · Samuel Cahyawijaya · Xufeng Duan · Zhenguang Cai

Poster
in
Workshop: Foundation Model Interventions