Please use this identifier to cite or link to this item: http://hdl.handle.net/20.500.11889/6743
DC FieldValueLanguage
dc.contributor.authorAddabe', Yaraen_US
dc.contributor.authorAbu Hammad, Yaraen_US
dc.contributor.authorAyyad, Natalyen_US
dc.contributor.authorYahya, Adnanen_US
dc.date.accessioned2021-04-21T05:41:38Z-
dc.date.available2021-04-21T05:41:38Z-
dc.date.issued2021-02-11-
dc.identifier.citationYara Addabe', Yara Abu Hammad, Nataly Ayyad and Adnan Yahya. A Dataset for Authorship Analysis of Short Modern Arabic Text. Graduation Project. Department of Electrical and Computer Engineering. Birzeit University. 2021en_US
dc.identifier.urihttp://hdl.handle.net/20.500.11889/6743-
dc.description.abstractThe collection has 71391 Arabic tweets written by 44 Arab author from 13 Arab Countries plus tweets of Pope Francis in MSA. Tweepy API was used for tweet scraping. Tweet topics: Politics, Journalism, Religion, Arabic Literature. The tweets were preprocessed: Retweets and replies were removed, any tweet with the author’s name was removed, emojis, hyperlinks, and English hashtags were filtered out, numbers were normalized and English and other non-Arabic characters were filtered out. Then stop words were removed, tokens lemmatized and normalized. The two files have the training and testing data used in our project.en_US
dc.description.sponsorshipBirzeit Universityen_US
dc.language.isoaren_US
dc.publisherBZU-ECE Departmenten_US
dc.subjectTweets, Arabicen_US
dc.subjectAuthorship attributionen_US
dc.subjectAuthorship - Data processingen_US
dc.subjectComputational linguisticsen_US
dc.subjectAuthorship analysisen_US
dc.subjectInformation storage and retrieval systemsen_US
dc.subjectTweets - Authorship analysisen_US
dc.titleA Dataset for Authorship Analysis of Short Modern Arabic Texten_US
dc.title.alternativeAuthorshiip Attribution of Arabic Tweets Dataseten_US
dc.typeDataseten_US
dcterms.availableSeptember 2020en_US
dcterms.creatorYara Addabe', Yara Abu Hammad, Nataly Ayyad and Adnan Yahyaen_US
dcterms.formatExcelen_US
dcterms.instructionalMethodAutomated collection from Twitteren_US
dcterms.sourceTwitteren_US
newfileds.departmentEngineering and Technologyen_US
newfileds.item-access-typearchiveen_US
newfileds.thesis-prognoneen_US
newfileds.general-subjectComputers and Information Technology | الحاسوب وتكنولوجيا المعلوماتen_US
item.grantfulltextopen-
item.languageiso639-1other-
item.fulltextWith Fulltext-
Appears in Collections:6. BZU Dataset Collection
Files in This Item:
File Description SizeFormat
AuthorAttributionTweetsTrainingDataYara_2_Nataly.xlsxTraining Dataset Tweets4.14 MBMicrosoft Excel XMLView/Open
AuthorAttributionTweetsTestDataYara_2_Nataly.xlsxTesting Dataset Tweets1.06 MBMicrosoft Excel XMLView/Open
Show simple item record

Page view(s)

538
checked on Apr 14, 2024

Download(s)

9,942
checked on Apr 14, 2024

Google ScholarTM

Check


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.