Please use this identifier to cite or link to this item: http://hdl.handle.net/20.500.11889/5146
Title: Document similarity for Arabic and cross-lingual web content
Authors: Salhi, Ali
Yahya, Adnan
Keywords: Computational linguistics - Mathematical models
Information visualization
Information storage and retrieval - Artificial intelligence
Text processing (Computer science) - Arab countries
Issue Date: 2011
Publisher: Springer Verlag
Abstract: Document similarity is basic for Information Retrieval. Cross Lingual (CL) similarity is important for many data processing tasks such as CL palgiarism detection and retrieval and document quality assessment. We study CL similarity based on the Explicit Semantic Association (ESA) adapted to a cross lingual setting with focus on Arabic. We compare the degree to which CL similarity testing performs where one of the language is Arabic with its monolongual counterpart for various text chunk sizes. We describe the used infrastructure and report on some of the testing results, study the possible sources of encountered weaknesses and point to the possible directions for improvement.
URI: http://hdl.handle.net/20.500.11889/5146
Appears in Collections:Fulltext Publications

Files in This Item:
File Description SizeFormat 
ICALP17Document Similarity for Arabic and CrossEarlyDraft.pdf648.96 kBAdobe PDFView/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.