Please use this identifier to cite or link to this item:
Full metadata record
DC FieldValueLanguage
dc.contributor.authorAbuaiadah, Diab
dc.contributor.authorRajendran, Dileep
dc.contributor.authorJarrar, Mustafa
dc.description.abstractThe focus of this study is to evaluate the impact of linguistic preprocessing and similarity functions for clustering Arabic Twitter tweets. The experiments apply an optimized version of the standard K-Means algorithm to assign tweets into positive and negative categories. The results show that root-based stemming has a significant advantage over light stemming in all settings. The Averaged Kullback-Leibler Divergence similarity function clearly outperforms the Cosine, Pearson Correlation, Jaccard Coefficient and Euclidean functions. The combination of the Averaged Kullback-Leibler Divergence and root-based stemming achieved the highest purity of 0.764 while the second-best purity was 0.719. These results are of importance as it is contrary to normalsized documents where, in many information retrieval applications, light stemming performs better than rootbased stemming and the Cosine function is commonly useden_US
dc.subjectLanguage and emotions - Arab countriesen_US
dc.subjectSimilarity (Language learning)en_US
dc.subject.lcshOnline social networks
dc.subject.lcshArabic language - Terms and phrases
dc.subject.lcshCluster analysis
dc.subject.lcshOntologies (Information retrieval)
dc.titleClustering Arabic tweets for sentiment analysisen_US
dc.typeConference Proceedingsen_US
newfileds.departmentEngineering and Technologyen_US
newfileds.conferenceIEEE/ACS 14th International Conference on Computer Systems and Applicationsen_US
newfileds.general-subjectComputers and Information Technology | الحاسوب وتكنولوجيا المعلوماتen_US
Appears in Collections:Fulltext Publications

Files in This Item:
File Description SizeFormat 
ARJ17.pdf787.79 kBAdobe PDFView/Open

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.