Poster
in
Affinity Workshop: Global South in AI
Automated Misinformation: Mistranslation of news feed using multi-lingual translation systems in Facebook
Sundaraparipurnan Narayanan
Keywords: [ misinformation ] [ mistranslation ] [ multilingual translation ] [ NLP ]
Machine translations have evolved over the past decade and are increasingly used in multiple applications. Increasingly, translation models focus on becoming multilingual, enabling translations across hundreds of languages, including many low-resource languages (e.g. Facebook's No Languages Left Behind can translate text from and to 200 languages). Facebook, in their announcement, also mentioned that NLLB would support 25 billion translations served daily on Facebook News Feed, Instagram, and other platforms. A Facebook user receives auto-translated (machine-translated) content on his news feed based on the language setting and translation preferences updated on the platform. Multilingual translation models are not free from errors. These errors are typically caused by a lack of adequate context or domain-specific words, ambiguity or sarcasm in the text, incorrect dialect, missing words, transliteration instead of translation, incorrect lexical choice, and differences in grammatical properties between languages. Such errors, may, on some occasions, lead to misinformation about the text translated. This paper examines instances of misinformation caused by mistranslations from English to Tamil in the Facebook news feed. For the purpose of the research, categories of news headlines were collected from multiple sources, including (a) General- news headlines dataset from Kaggle (30 samples), (b) Sarcastic - news headlines from Kaggle (10 samples), (c) Domain-specific -news headline from Wired (10 samples), and (d) Ambiguous headlines from linguistics page (15 samples). News headlines in each of these datasets were filtered for politics as a topic given the potential impact it may cause due to misinformation. From the filtered news headlines category database, samples were randomly identified for the purpose of the translation. Translations were undertaken on these samples using NLLB. A test code was created in Google Colab with NLLB pre-trained model (available on Huggingface). The translations were evaluated for mistranslations. Incomplete translations were eliminated (~27%) and translations that provided complete meaning (~73%) were examined for misinformation. A translation was classified as misinformation if it gives false information in whole or part of the news headline. For instance, “Trump