Poster
in
Affinity Event: Muslims in ML
QMorphVec: A Morphologically-Aware Embedding of Quranic Vocabulary
Doratossadat Dastgheib · Alireza Sahebi · Ehsan Khadangi · Ehsaneddin Asgari
Keywords: [ Morphologically-Aware Embedding ] [ Quranic Embedding ]
Developing effective word representations that incorporate linguistic features and capture contextual information is an essential step in natural language processing (NLP) tasks. When working with a text corpus from a specific domain with profound meanings, such as the Holy Quran, deriving word representations based on domain-specific textual contexts is particularly valuable. In this research, we employ a context-masking approach to generate separate embedding spaces for Quranic roots, lemmas, and surface forms, and then project them into a common space through linear mapping. We demonstrate that our in-domain embeddings, trained solely on Quranic text and it morphological contexts, perform comparably to—and, in some cases, better than—OpenAI's large embeddings while surpassing the multilingual XLM-R embeddings. Additionally, through qualitative analysis, we illustrate their utility in Quranic word analogy tasks. The code and the embeddings are available at: [anonymized for the double-blinded review].