Skip to yearly menu bar Skip to main content


Poster
in
Affinity Workshop: Black in AI

OCR System for the Recognition of Ethiopic Real-Life Documents

Tesfahunegn Mengistu

Keywords: [ Natural Language Processing ]


Abstract:

A bulk of real-life documents contain vital information and knowledge about history, culture,economy, politics, religion, and science that are written in Ethiopic script. This knowledge has to be shared andthe advancement of technology like Optical Character Recognition (OCR) brings the need to digitize documentsand make them available for public use. OCR is a process that allows printed, typewritten, and handwritten textto be recognized optically and converted into a machine-readable format that can be accepted by a computer forfurther processing. Nowadays, effective OCR systems have been developed for languages, like English that haswider use internationally. Researches in the area of Amharic OCR are ongoing since 1997. Attempts were madein adopting recognition algorithms to develop Amharic OCR. This study is, thus, an attempt made to develop anOCR system for real-life documents written in Ethiopic characters. In this study we propose a novel featureextraction schema using Gabor Filter and Principal Component Analysis (PCA), followed by a GeneticAlgorithm (GA) based on supported vector machine classifier (SVM). The prototype was tested on real-lifeEthiopic documents such as books, newspapers, and magazines, in which an average accuracy of 98.33% forEthiopic characters is registered.

Chat is not available.