Skip to yearly menu bar Skip to main content


Poster

Fit for our purpose, not yours: Benchmark for a low-resource, Indigenous language

Suzanne Duncan · Gianna Leoni · Lee Steven · Keoni K Mahelona · Peter Lucas K Jones

West Ballroom A-D #5409
[ ]
[ Poster
Fri 13 Dec 11 a.m. PST — 2 p.m. PST

Abstract:

Influential and popular benchmarks in AI are largely irrelevant to developing NLP tools for low-resource, Indigenous languages. With the primary goal of measuring the performance of general-purpose AI systems, these benchmarks fail to give due consideration and care to individual language communities, especially low-resource languages. The datasets contain numerous grammatical and orthographic errors, poor pronunciation, limited vocabulary, and the content lacks cultural relevance to the language community. To overcome the issues with these benchmarks, we have created a dataset for te reo Māori (the Indigenous language of Aotearoa/New Zealand) to pursue NLP tools that are ‘fit-for-our-purpose’. This paper demonstrates how low-resourced, Indigenous languages can develop tailored, high-quality benchmarks that; i. Consider the impact of colonisation on their language; ii. Reflect the diversity of speakers in the language community; iii. Support the aspirations for the tools they are developing and their language revitalisation efforts.

Chat is not available.