Clustering Arabic tweets for sentiment analysis

Abuaiadah, Diab; Rajendran, Dileep; Jarrar, Mustafa

Please use this identifier to cite or link to this item: http://hdl.handle.net/20.500.11889/8525

Title:	Clustering Arabic tweets for sentiment analysis
Authors:	Abuaiadah, Diab Rajendran, Dileep Jarrar, Mustafa
Keywords:	Electronic data processing - Distributed processing - Arabic language;Linguistic analysis (Linguistics) - Arabic language;Collocation (Linguistics), Arabic;Algorithms Machine learning, Arabic;Cluster analysis - Computer programs;Social media - Arab countries
Issue Date:	2018
Publisher:	Proceedings of IEEE/ACS International Conference on Computer Systems and Applications, AICCSA
Abstract:	The focus of this study is to evaluate the impact of linguistic preprocessing and similarity functions for clustering Arabic Twitter tweets. The experiments apply an optimized version of the standard K-Means algorithm to assign tweets into positive and negative categories. The results show that root-based stemming has a significant advantage over light stemming in all settings. The Averaged Kullback-Leibler Divergence similarity function clearly outperforms the Cosine, Pearson Correlation, Jaccard Coefficient and Euclidean functions. The combination of the Averaged Kullback-Leibler Divergence and root-based stemming achieved the highest purity of 0.764 while the second-best purity was 0.719. These results are of importance as it is contrary to normal sized documents where, in many information retrieval applications, light stemming performs better than root based stemming and the Cosine function is commonly used.
URI:	http://hdl.handle.net/20.500.11889/8525
DOI:	10.1109/AICCSA.2017.162
Appears in Collections:	Fulltext Publications

Files in This Item:

File	Description	Size	Format
Clustering Arabic tweets for sentiment analysis.pdf		944 kB	Adobe PDF	View/Open

Show full item record

Page view(s)

5

checked on Jan 20, 2024

Download(s)

1

checked on Jan 20, 2024

Google Scholar^TM

Check

Files in This Item:

Page view(s)

Download(s)

Google ScholarTM

Altmetric

Google Scholar^TM