Poster
in
Affinity Event: Black in AI
Can CNNs learn to encode word representation?
Amanuel Mersha · Tigabu Dagne Akal · Fitsum Alemu
Word embeddings powered the early days of neural network-based NLP research. Their effectiveness in small data regimes makes them still relevant in low-resource environments. However, they are limited in two critical ways: linearly increasing memory requirements and out-of-vocabulary token handling. In this work, we present a distillation technique of word embeddings into a CNN network using contrastive learning. This method allows embeddings to be regressed given the characters of a token. It is then used as a pretrained layer, replacing word embeddings. Low-resource languages are the primary beneficiary of this method and hence, we show its effectiveness on two morphology-rich Semitic languages, and in a multilingual NER task comprised of 10 African languages. The resulting model is a data-efficient one that improves both performance and memory footprint. Furthermore, unlike word embeddings, it easily supports cross-language knowledge transfer.