Please use this identifier to cite or link to this item: http://hdl.handle.net/20.500.11889/4394
Title: Enhancement tools for Arabic web search : a statistical approach
Authors: Salhi, Ali
Yahya, Adnan
Keywords: Natural language processing (Computer science)
Information retrieval - Arabic countries
Information storage and retrieval systems
Computational linguistics
Arabic language - Roots
Issue Date: 25-Apr-2011
Abstract: The Arabic web content is growing rapidly and the need for its efficient management is gaining importance and the morphological complexity of Arabic raises many challenges in this regard. This paper reports on some of our work aimed at designing text mining and query pre-processing tools that are able to efficiently process and search large quantities of Arabic web data. In our research we try to address the challenges Arabic poses for natural language processing (NLP) and information retrieval, root extraction, language detection, and Arabic query correction, suggestion and expansion. While not reported in detail here, we are also developing tools for automatic Arabic document categorization. All through, we employ a statistical/Corpus-based approach based on data obtained from a variety of sources. Based on corpus statistics we constructed databases of words and their frequencies as single, double and triple expressions and used that as the infrastructure for the well structured search aid tools that are able to handle the sophisticated nature of Arabic, and capable of being integrated into existing web search engines and document processing systems. We also utilize context analysis and spellchecking of the user queries to enable a more complete and efficient search. While the results reported here are promising, they must be viewed as work in progress, still in need of testing, refining, integration and deployment in real life settings.
URI: http://hdl.handle.net/20.500.11889/4394
Appears in Collections:Fulltext Publications

Files in This Item:
File Description SizeFormat 
UAEInnovationsConferencePaperJanuary2011.pdf332.03 kBAdobe PDFView/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.