Please use this identifier to cite or link to this item:
http://hdl.handle.net/20.500.11889/5888
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Jarrar, Mustafa | - |
dc.contributor.author | Zaraket, Fadi | - |
dc.contributor.author | Asia, Rami | - |
dc.contributor.author | Amayreh, Hamzeh | - |
dc.date.accessioned | 2019-03-26T06:28:48Z | - |
dc.date.available | 2019-03-26T06:28:48Z | - |
dc.date.issued | 2018-12 | - |
dc.identifier.citation | Mustafa Jarrar, Fadi Zaraket, Rami Asia, Hamzeh Amayreh: Diacritic-Based Matching of Arabic Words. ACM Asian and Low-Resource Language Information Processing. Volume 18, No 2, Pages(10:1--10:21), ACM, December 2018. ISSN 2375-4699. | en_US |
dc.identifier.issn | 2375-4699 | - |
dc.identifier.uri | http://hdl.handle.net/20.500.11889/5888 | - |
dc.description.abstract | Words in Arabic consist of letters and short vowel symbols called diacritics inscribed atop regular letters. Changing diacritics may change the syntax and semantics of a word; turning it into another. This results in difficulties when comparing words based solely on string matching. Typically, Arabic NLP applications resort to morphological analysis to battle ambiguity originating from this and other challenges. In this paper, we introduce three alternative algorithms to compare two words with possibly different diacritics. We propose the Subsume knowledge-based algorithm, the Imply rule-based algorithm, and the Alike machine- learning based algorithm. We evaluated the soundness, completeness and accuracy of the algorithms against a large dataset of 86,886 word pairs. Our evaluation shows that the accuracy of Subsume (100%), Imply (99.32%), and Alike (99.53%). Although accurate, Subsume was able to judge only 75% of the data. Both Subsume and Imply are sound, while Alike is not. We demonstrate the utility of the algorithms using a real-life use case in lemma disambiguation and in linking hundreds of Arabic dictionaries. | en_US |
dc.language.iso | en_US | en_US |
dc.publisher | ACM | en_US |
dc.relation.ispartofseries | Vol. 18, No 2; | - |
dc.subject | Natural language processing (Computer science) | en_US |
dc.subject | Phonology, Arabic - Data processing | en_US |
dc.subject | Grammar, Arabic - Phonology | en_US |
dc.subject | Language resources | en_US |
dc.subject | Computational linguistics | en_US |
dc.subject | Diacritics | en_US |
dc.subject | Disambiguation | en_US |
dc.subject | Ambiguity - Data processing | en_US |
dc.title | Diacritic-Based Matching of Arabic Words | en_US |
dc.type | Article | en_US |
newfileds.department | Engineering and Technology | en_US |
newfileds.conference | ACM Asian and Low-Resource Language Information Processing | en_US |
newfileds.item-access-type | open_access | en_US |
newfileds.thesis-prog | none | en_US |
newfileds.general-subject | Computers and Information Technology | الحاسوب وتكنولوجيا المعلومات | en_US |
item.grantfulltext | open | - |
item.languageiso639-1 | other | - |
item.fulltext | With Fulltext | - |
Appears in Collections: | Fulltext Publications |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
JZAA18.pdf | Article | 5.98 MB | Adobe PDF | View/Open |
Page view(s)
135
Last Week
0
0
Last month
4
4
checked on Apr 14, 2024
Download(s)
100
checked on Apr 14, 2024
Google ScholarTM
Check
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.