Poster
Zipfian Whitening
Sho Yokoi · Han Bao · Hiroto Kurita · Hidetoshi Shimodaira
It has become clear that the word embedding space is skewed and correcting this can improve task performance. We point out that most approaches for modeling, correcting, and measuring the symmetry of an embedding space implicitly assume that the word frequencies are uniform; in reality, word frequencies follow a power-law distribution a.k.a. Zipf's law. Surprisingly, simply performing PCA whitening while using empirical word frequency significantly improves task performance, surpassing established baselines including all-but-the-top, a pioneering isotropization method, and SIF, a strong sentence vector construction method. From a theoretical perspective, the two approaches can be clearly classified based on whether the base measure of the exponential family is uniform or Zipfian. By adopting the latter approach, we can naturally emphasize informative low-frequency words in terms of norm, which becomes evident from an information geometry perspective. Additionally, our theory provides a unified explanation for why popular natural language processing methods, i.e. skip-gram, SimCSE, and headless language models, perform so well.
Live content is unavailable. Log in and register to view live content