Segmentation-based, omnifont printed Arabic character recognition without font identification

Qaroush, Aziz; Awad, Abdalkarim; Modallal, Mohammad; Ziq, Malik

Please use this identifier to cite or link to this item: http://hdl.handle.net/20.500.11889/6680

DC Field	Value	Language
dc.contributor.author	Qaroush, Aziz	en_US
dc.contributor.author	Awad, Abdalkarim	en_US
dc.contributor.author	Modallal, Mohammad	en_US
dc.contributor.author	Ziq, Malik	en_US
dc.date.accessioned	2021-03-09T08:56:55Z	-
dc.date.available	2021-03-09T08:56:55Z	-
dc.date.issued	2020-10-10	-
dc.identifier.uri	http://hdl.handle.net/20.500.11889/6680	-
dc.description.abstract	Optical Character Recognition OCR is an essential part of many real-world applications such as digital archiving, automatic number plate recognition, handle cheques, etc. However, developing an OCR for printed Arabic text is still a challenging and open research field due to the special characteristics of Arabic cursive script. In this paper, we propose a segmentation-based, omnifont, open-vocabulary OCR for printed Arabic text. The proposed approach doesn’t require an explicit font type recognition stage. It uses an explicit, indirect character segmentation method. The presented segmentation method is baseline dependent and employs a hybrid, three-steps character segmentation algorithm to handle the problem of character overlapping. Besides, it uses a set of topological features that are designed and generalized to make the segmentation approach font independent. The segmented characters are fed as an input to a convolutional neural network for feature extraction and recognition. The APTID-MF data set has been used for testing and evaluation. The average accuracy of the proposed segmentation stage is 95%, while the average accuracy of the recognition stage is 99.97%. The whole approach achieves an average accuracy of 95% without using font-type recognition or any post-processing techniques.	en_US
dc.subject	Optical character recognitions	en_US
dc.subject	Arabic character sets (Data processing)	en_US
dc.subject	Mono-font	en_US
dc.subject	Mixed-font	en_US
dc.subject	Pattern recognition systems	en_US
dc.subject	Image segmentation	en_US
dc.subject	Character segmentation	en_US
dc.subject	Image processing - Digital techniques	en_US
dc.subject	Convolutional neural networks	en_US
dc.title	Segmentation-based, omnifont printed Arabic character recognition without font identification	en_US
dcterms.identifier	https://doi.org/10.1016/j.jksuci.2020.10.001	en_US
newfileds.department	Engineering and Technology	en_US
newfileds.item-access-type	open_access	en_US
newfileds.thesis-prog	none	en_US
newfileds.general-subject	none	en_US
dc.identifier.doi	https://doi.org/10.1016/j.jksuci.2020.10.001	-
item.grantfulltext	open	-
item.fulltext	With Fulltext	-
Appears in Collections:	6. BZU Dataset Collection

Files in This Item:

File	Description	Size	Format
1-s2.0-S131915782030481X-main.pdf		2.56 MB	Adobe PDF	View/Open

Show simple item record

Page view(s)

148

checked on Apr 14, 2024

Download(s)

126

checked on Apr 14, 2024

Google Scholar^TM

Check

Files in This Item:

Page view(s)

Download(s)

Google ScholarTM

Altmetric

Google Scholar^TM