Please use this identifier to cite or link to this item: http://hdl.handle.net/20.500.11889/8363
Title: An efficient, font independent word and character segmentation algorithm for printed Arabic text
Authors: Qaroush, Aziz 
Jaber, Bassam 
Mohammad, Khader 
Washaha, Mahdi 
Maali, Eman 
Nayef, Nibal 
Keywords: Arabic OCR;Machine learning;Neural networks (Computer science);Optical Character Recognition;Word segmentation;Speech perception - Arabic language;Character segmentation;Natural language processing (Computer science);Image analysis;Image processing - Digital techniques;Segmentation techniques;Baseline;Projection profile
Issue Date: 2022
Publisher: Journal of King Saud University - Computer and Information Sciences
Abstract: Characters segmentation is a necessity and the most critical stage in Arabic OCR system. It has attracted the interest of a wide range of researchers. However, the nature of the Arabic cursive script poses extra challenges that need further investigation. Therefore, having a reliable and efficient Arabic OCR system that is independent of font variations is highly required. In this paper, an indirect, font-in dependent word and character segmentation algorithm for printed Arabic text investigated. The proposed algorithm takes a binary line image as an input and produces a set of binary images consisting of one character or ligature as an output. The segmentation performed at two levels: a word segmentation performed in the first level, by employing a vertical projection at the input line image along with using Interquartile Range (IQR) method to differentiate between word gaps and within word gaps. A projection profile method used as a second level of segmentation along with a set of statistical and topological features, which are font independent, to identify the correct segmentation points from all potential points. The APTI dataset used to test the proposed algorithm with a variety of font type, size, and style. The algorithm experimented on 1800 lines (approximately 24,816 words) with an average accuracy of 97.7% for words segmentation and 97.51% for characters segmentation.
URI: http://hdl.handle.net/20.500.11889/8363
DOI: 10.1016/j.jksuci.2019.08.013
Appears in Collections:Fulltext Publications

Show full item record

Page view(s)

11
checked on Jun 18, 2024

Download(s)

10
checked on Jun 18, 2024

Google ScholarTM

Check

Altmetric


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.