Please use this identifier to cite or link to this item:
Title: Models for Arabic document quality assessment
Authors: Yahya, Adnan 
Ahmad, Afnan 
Assaf, Alaa 
Khater, Rawan 
Salhi, Ali 
Keywords: Information services - Quality control;Archives - Arab countries;Information retrieval - Arab countries;Wikipedia;Archives - Processing;Machine learning;Information retrieval - Reliability
Issue Date: 27-Aug-2020
Publisher: Springer
Source: Adnan Yahya and Afnan Ahmad and Alaa Assaf and Rawan Khater and Ali Salhi. Models for Arabic Document Quality Assessment. 3rd Workshop on Quality of Open Data (QOD 2020). June 8-10, 2020. Colorado Springs, USA.
Conference: 3rd Workshop on Quality of Open Data (QOD 2020), CO, USA 
Abstract: Digital content has been increasing rapidly. This content can be generated, accessed and used by anyone and thus the need for quality assessment of web content before usage becomes an important issue. Devising methods to assess quality of Arabic digital content is the focus of this paper. Our work was partially based on Wikipedia articles annotated into featured and good according to quality guidelines of the Wikipedia. Our analysis was directed at finding features that can serve as best quality indicators. Using the defined features we trained a high accuracy quality assessment model using machine-learning algorithms. Our work went beyond the Wik-ipedia documents to build a general model that can assess the quality of Arabic documents that lack Wikipedia metadata with acceptable accuracy. The model was trained and built using features from documents we collected from Arabic online news sites and blogs, and annotated in collaboration with university students.
Appears in Collections:Fulltext Publications

Files in This Item:
File Description SizeFormat
ArabicDocumentQualityAssessmentPaperAdnanYahyaQOD2020SemiFinal.pdfPaper Text, Prefinal956.93 kBAdobe PDFView/Open
Show full item record

Page view(s)

checked on Jun 22, 2021


checked on Jun 22, 2021

Google ScholarTM


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.