Skip to yearly menu bar Skip to main content


Oral
in
Affinity Workshop: Global South AI

Hindi/Hinglish words used in Gen AI

Tapasya Sariya

Keywords: [ language specific words ] [ Gender Neutral words ] [ collective nouns ] [ slangs ]


Abstract:

Hindi is one of the most common languages spoken by around 57% of people in the Indian subcontinent. Therefore it becomes quite important that the generative AI is adaptive in understanding and responding well to the prompts thrown at it and not have a bias. While sending prompts on Bard for Gender neutral words like “shishya” (which translates to student/pupil in English) it shows only male students neglecting females. Not only that but it takes us back to the old days when there was a guru-shishya relationship and not the current day student. During the observations, sometimes the tool will not even send any response even if it understood the prompt and just say “sorry” in Hindi. Or in other cases for example “एशियाई डॉक्टरों की तस्वीरें” on Stable Diffusion and even Bard would only give either doctors from East Asian countries or images of males. There is no visibility of West Asian communities or females. This is real unfairness in the systems and training data sets which can be worked on and changed. The LLM models need to the trained on local languages with multiple sets of data while covering larger areas and communities and not opressing sections and terminologies(slangs) used in different parts of the society. There must be fairness in the system for more faith in the AI and for people to use it to their benefit. If we talk about not having any bias between humans we should make sure AI does the same too.

Chat is not available.