Please use this identifier to cite or link to this item: http://hdl.handle.net/20.500.11889/4485
Title: Arabic text correction using dynamic categorized dictionaries: a statistical approach
Authors: Yahya, Adnan
Salhi, Ali
Keywords: Natural language processing (Computer science)
Wikipedia
Editing
Language and languages - Computer-assisted instruction
Data mining
Issue Date: May-2013
Citation: 2. Yahya, A. and A. Salhi. "Arabic Text Correction Using Dynamic Categorized Dictionaries: A Statistical Approach" ; Linguistica Communicatio Journal (Selected Papers from CITALA 2012);, Volume 5, 2013.
Abstract: This paper describes a technique for spelling and correcting Arabic text that provides different variables that can be controlled to give customized results based on the properties of the processed text. The proposed technique depends on dynamic dictionaries controlled and customized based on the input text categorization. In the research reported here we employ a statistical/corpus-based approach with data obtained from the Arabic Wikipedia and local Palestinian newspapers. Based on corpus statistics we constructed databases of words and their frequencies as single, double and triple expressions and used that as the infrastructure for our spelling and text correction technique. Our spelling technique builds on earlier work[7], but using new spelling variables and dynamic dictionaries based on categorized texts. We briefly report on the results of preliminary testing and analysis. While the results reported here are promising, they must be viewed as work in progress, still in need of more testing, refining, integration and deployment in real life settings.
Description: Selected for Journal Publication from CITALA12 Conference.
URI: http://hdl.handle.net/20.500.11889/4485
Appears in Collections:Fulltext Publications

Files in This Item:
File Description SizeFormat 
YahyaSalhiLinguisticaComPaper.pdf601.84 kBAdobe PDFView/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.