Please use this identifier to cite or link to this item: http://hdl.handle.net/20.500.11889/6743
Title: A Dataset for Authorship Analysis of Short Modern Arabic Text
Other Titles: Authorshiip Attribution of Arabic Tweets Dataset
Authors: Addabe', Yara 
Abu Hammad, Yara 
Ayyad, Nataly 
Yahya, Adnan 
Keywords: Tweets, Arabic;Authorship attribution;Authorship - Data processing;Computational linguistics;Authorship analysis;Information storage and retrieval systems;Tweets - Authorship analysis
Issue Date: 11-Feb-2021
Publisher: BZU-ECE Department
Source: Yara Addabe', Yara Abu Hammad, Nataly Ayyad and Adnan Yahya. A Dataset for Authorship Analysis of Short Modern Arabic Text. Graduation Project. Department of Electrical and Computer Engineering. Birzeit University. 2021
Abstract: The collection has 71391 Arabic tweets written by 44 Arab author from 13 Arab Countries plus tweets of Pope Francis in MSA. Tweepy API was used for tweet scraping. Tweet topics: Politics, Journalism, Religion, Arabic Literature. The tweets were preprocessed: Retweets and replies were removed, any tweet with the author’s name was removed, emojis, hyperlinks, and English hashtags were filtered out, numbers were normalized and English and other non-Arabic characters were filtered out. Then stop words were removed, tokens lemmatized and normalized. The two files have the training and testing data used in our project.
URI: http://hdl.handle.net/20.500.11889/6743
Appears in Collections:6. BZU Dataset Collection

Files in This Item:
File Description SizeFormat
AuthorAttributionTweetsTrainingDataYara_2_Nataly.xlsxTraining Dataset Tweets4.14 MBMicrosoft Excel XMLView/Open
AuthorAttributionTweetsTestDataYara_2_Nataly.xlsxTesting Dataset Tweets1.06 MBMicrosoft Excel XMLView/Open
Show full item record

Google ScholarTM

Check


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.