Arabic text categorization based on Arabic Wikipedia

Yahya, Adnan; Salhi, Ali

Please use this identifier to cite or link to this item: http://hdl.handle.net/20.500.11889/4443

DC Field	Value	Language
dc.contributor.author	Yahya, Adnan
dc.contributor.author	Salhi, Ali
dc.date.accessioned	2017-03-09T07:13:15Z
dc.date.available	2017-03-09T07:13:15Z
dc.date.issued	2014-02-01
dc.identifier.citation	Adnan Yahya and Ali Salhi. 2014. Arabic Text Categorization Based on Arabic Wikipedia. 13, 1, Article 4 (February 2014), 20 pages. DOI: http://dx.doi.org/10.1145/2537129	en_US
dc.identifier.uri	http://hdl.handle.net/20.500.11889/4443
dc.description	ACM Transactions on Asian Language Information Processing. Vol. 13, No. 1, Article 4. February 2014
dc.description.abstract	This paper describes an algorithm for categorizing Arabic text, relying on highly categorized corpus-based data sets, obtained from the Arabic Wikipedia by using manual and automated processes to build and customize categories. The categorization algorithm was built by adopting a simple categorization idea, then moving forward to more complex one. We applied tests and filtration criteria to end with the best and most efficient results that our algorithm can achieve. The categorization depends on the statistical relation between the input text and the reference (training) data supported by well defined Wikipedia-based categories. Our algorithm supports two levels for categorizing Arabic text; categories are grouped into a hierarchy of main categories and subcategories. This introduces a challenge due to the correlation between certain subcategories and overlap between main categories. We argue that our algorithm achieved good performance compared to other methods reported in the literature.	en_US
dc.language.iso	en_US	en_US
dc.publisher	ACM: Association for Computing Machinery	en_US
dc.relation.ispartofseries	doi>10.1145/2540989;
dc.subject	Natural language processing (Computer science)	en_US
dc.subject	Computer network resources - Arab countries	en_US
dc.subject	Computational linguistics	en_US
dc.subject	Linguistics - Databases	en_US
dc.subject	Text processing (Computer science)	en_US
dc.subject.lcsh	Wikipedia
dc.title	Arabic text categorization based on Arabic Wikipedia	en_US
dc.type	Article	en_US
newfileds.department	Engineering and TechnologyEngineering and Technology	en_US
newfileds.custom-issue-date	Vol. 13, No. 1, Article 4. February 2014	en_US
newfileds.conference	ACM Transactions on Asian Language Information Processing. Vol. 13, No. 1, Article 4. February 2014	en_US
newfileds.item-access-type	bzu	en_US
newfileds.thesis-prog	none	en_US
newfileds.general-subject	Computers and Information Technology \| الحاسوب وتكنولوجيا المعلومات	en_US
item.languageiso639-1	other	-
item.fulltext	With Fulltext	-
item.grantfulltext	open	-
Appears in Collections:	Fulltext Publications

Files in This Item:

File	Description	Size	Format
TALIP_PreFinalCopy.pdf		651.3 kB	Adobe PDF	View/Open

Show simple item record

Page view(s)

165

Last Week
1

Last month
4

checked on Apr 14, 2024

Download(s)

128

checked on Apr 14, 2024

Google Scholar^TM

Check

Files in This Item:

Page view(s)

Download(s)

Google ScholarTM

Google Scholar^TM