Please use this identifier to cite or link to this item: http://hdl.handle.net/20.500.11889/8137
Title: Contour-based character segmentation for printed Arabic text with diacritics
Authors: Mohammad, Khader 
Qaroush, Aziz 
Ayesh, Muna 
Washha, Mahdi 
Alsadeh, Ahmad 
Agaian, Sos 
Keywords: Optical character recognition;Character sets (Data processing);Character segmentation;Arabic words with diacritics
Issue Date: 2019
Abstract: Current developments in sensors open new possible uses across numerous real-life applications, including optical character recognition (OCR). An OCR system requires incorporation of text processing tools into the sensor functionality. The most critical stage in OCR systems is the segmentation stage. It refers to the challenge of subdividing a text image into characters, which can be individually processed using a classifier. The cursive nature of the Arabic script such as the existence of different shapes for each character according to its location in the word besides the existence of diacritics makes Arabic character segmentation a very challenging task. A robust offline character segmentation algorithm for printed Arabic text with diacritics is developed based on the contour extraction technique. The algorithm works through extracting the up-contour part of a word and then identifies the splitting areas of the word characters. Then a post processing stage is used to handle the over segmentation problems that appear in the initial segmentation stage. The proposed scheme is benchmarked using the APTI dataset and a manually collected dataset consisting of image texts varying in font size, type, and style for more than 38,000 words. The experiments show that the proposed algorithm is able to segment Arabic words with diacritics with an average accuracy of 98.5%.
URI: http://hdl.handle.net/20.500.11889/8137
Appears in Collections:Fulltext Publications

Show full item record

Page view(s)

16
checked on Feb 6, 2024

Download(s)

8
checked on Feb 6, 2024

Google ScholarTM

Check


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.