Poster
in
Workshop: Transfer Learning for Natural Language Processing
DistillEmb: Distilling Word Embeddings via Contrastive Learning
Amanuel Mersha · Stephen Wu
Word embeddings powered the early days of neural network-based NLP research. Their effectiveness in small data regimes makes them still relevant in low-resource environments. However, they are limited in two critical ways: linearly increasing memory requirements and out-of-vocabulary token handling. In this work, we present a distillation technique of word embeddings into a CNN network using contrastive learning. This method allows embeddings to be regressed given the characters of a token. It is then used as a pretrained layer, replacing word embeddings. Low-resource languages are the primary beneficiary of this method and hence, we show its effectiveness on two morphology-rich Semitic languages, and in a multilingual NER task comprised of 10 African languages. Apart from improving performance and lowering memory usage, the model is data efficient and is capable of transferring word representation to a similar language.