Please use this identifier to cite or link to this item:
Title: Arabic text correction using dynamic categorized dictionaries: a statistical approach
Authors: Yahya, Adnan
Salhi, Ali
Keywords: Natural language processing (Computer science);Wikipedia;Editing;Language and languages - Computer-assisted instruction;Data mining
Issue Date: May-2013
Source: 2. Yahya, A. and A. Salhi. "Arabic Text Correction Using Dynamic Categorized Dictionaries: A Statistical Approach" ; Linguistica Communicatio Journal (Selected Papers from CITALA 2012);, Volume 5, 2013.
Abstract: This paper describes a technique for spelling and correcting Arabic text that provides different variables that can be controlled to give customized results based on the properties of the processed text. The proposed technique depends on dynamic dictionaries controlled and customized based on the input text categorization. In the research reported here we employ a statistical/corpus-based approach with data obtained from the Arabic Wikipedia and local Palestinian newspapers. Based on corpus statistics we constructed databases of words and their frequencies as single, double and triple expressions and used that as the infrastructure for our spelling and text correction technique. Our spelling technique builds on earlier work[7], but using new spelling variables and dynamic dictionaries based on categorized texts. We briefly report on the results of preliminary testing and analysis. While the results reported here are promising, they must be viewed as work in progress, still in need of more testing, refining, integration and deployment in real life settings.
Description: Selected for Journal Publication from CITALA12 Conference.
Appears in Collections:Fulltext Publications

Files in This Item:
File Description SizeFormat
YahyaSalhiLinguisticaComPaper.pdf601.84 kBAdobe PDFView/Open
Show full item record

Page view(s)

Last Week
Last month
checked on Jun 27, 2024


checked on Jun 27, 2024

Google ScholarTM


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.