Please use this identifier to cite or link to this item: http://hdl.handle.net/20.500.11889/5888
DC FieldValueLanguage
dc.contributor.authorJarrar, Mustafa-
dc.contributor.authorZaraket, Fadi-
dc.contributor.authorAsia, Rami-
dc.contributor.authorAmayreh, Hamzeh-
dc.date.accessioned2019-03-26T06:28:48Z-
dc.date.available2019-03-26T06:28:48Z-
dc.date.issued2018-12-
dc.identifier.citationMustafa Jarrar, Fadi Zaraket, Rami Asia, Hamzeh Amayreh: Diacritic-Based Matching of Arabic Words. ACM Asian and Low-Resource Language Information Processing. Volume 18, No 2, Pages(10:1--10:21), ACM, December 2018. ISSN 2375-4699.en_US
dc.identifier.issn2375-4699-
dc.identifier.urihttp://hdl.handle.net/20.500.11889/5888-
dc.description.abstractWords in Arabic consist of letters and short vowel symbols called diacritics inscribed atop regular letters. Changing diacritics may change the syntax and semantics of a word; turning it into another. This results in difficulties when comparing words based solely on string matching. Typically, Arabic NLP applications resort to morphological analysis to battle ambiguity originating from this and other challenges. In this paper, we introduce three alternative algorithms to compare two words with possibly different diacritics. We propose the Subsume knowledge-based algorithm, the Imply rule-based algorithm, and the Alike machine- learning based algorithm. We evaluated the soundness, completeness and accuracy of the algorithms against a large dataset of 86,886 word pairs. Our evaluation shows that the accuracy of Subsume (100%), Imply (99.32%), and Alike (99.53%). Although accurate, Subsume was able to judge only 75% of the data. Both Subsume and Imply are sound, while Alike is not. We demonstrate the utility of the algorithms using a real-life use case in lemma disambiguation and in linking hundreds of Arabic dictionaries.en_US
dc.language.isoen_USen_US
dc.publisherACMen_US
dc.relation.ispartofseriesVol. 18, No 2;-
dc.subjectNatural language processing (Computer science)en_US
dc.subjectPhonology, Arabic - Data processingen_US
dc.subjectGrammar, Arabic - Phonologyen_US
dc.subjectLanguage resourcesen_US
dc.subjectComputational linguisticsen_US
dc.subjectDiacriticsen_US
dc.subjectDisambiguationen_US
dc.subjectAmbiguity - Data processingen_US
dc.titleDiacritic-Based Matching of Arabic Wordsen_US
dc.typeArticleen_US
newfileds.departmentEngineering and Technologyen_US
newfileds.conferenceACM Asian and Low-Resource Language Information Processingen_US
newfileds.item-access-typeopen_accessen_US
newfileds.thesis-prognoneen_US
newfileds.general-subjectComputers and Information Technology | الحاسوب وتكنولوجيا المعلوماتen_US
item.grantfulltextopen-
item.languageiso639-1other-
item.fulltextWith Fulltext-
Appears in Collections:Fulltext Publications
Files in This Item:
File Description SizeFormat
JZAA18.pdfArticle5.98 MBAdobe PDFView/Open
Show simple item record

Page view(s)

135
Last Week
0
Last month
4
checked on Apr 14, 2024

Download(s)

100
checked on Apr 14, 2024

Google ScholarTM

Check


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.