A Dataset for Authorship Analysis  of Short Modern Arabic Text

Please use this identifier to cite or link to this item: http://hdl.handle.net/20.500.11889/6743

Title:	A Dataset for Authorship Analysis of Short Modern Arabic Text
Other Titles:	Authorshiip Attribution of Arabic Tweets Dataset
Authors:	Addabe', Yara Abu Hammad, Yara Ayyad, Nataly Yahya, Adnan
Keywords:	Tweets, Arabic;Authorship attribution;Authorship - Data processing;Computational linguistics;Authorship analysis;Information storage and retrieval systems;Tweets - Authorship analysis
Issue Date:	11-Feb-2021
Publisher:	BZU-ECE Department
Source:	Yara Addabe', Yara Abu Hammad, Nataly Ayyad and Adnan Yahya. A Dataset for Authorship Analysis of Short Modern Arabic Text. Graduation Project. Department of Electrical and Computer Engineering. Birzeit University. 2021
Abstract:	The collection has 71391 Arabic tweets written by 44 Arab author from 13 Arab Countries plus tweets of Pope Francis in MSA. Tweepy API was used for tweet scraping. Tweet topics: Politics, Journalism, Religion, Arabic Literature. The tweets were preprocessed: Retweets and replies were removed, any tweet with the author’s name was removed, emojis, hyperlinks, and English hashtags were filtered out, numbers were normalized and English and other non-Arabic characters were filtered out. Then stop words were removed, tokens lemmatized and normalized. The two files have the training and testing data used in our project.
URI:	http://hdl.handle.net/20.500.11889/6743
Appears in Collections:	6. BZU Dataset Collection

File	Description	Size	Format
AuthorAttributionTweetsTrainingDataYara_2_Nataly.xlsx	Training Dataset Tweets	4.14 MB	Microsoft Excel XML	View/Open
AuthorAttributionTweetsTestDataYara_2_Nataly.xlsx	Testing Dataset Tweets	1.06 MB	Microsoft Excel XML	View/Open

538

checked on Apr 14, 2024

9,942

checked on Apr 14, 2024

Check