A Topic-Based Hidden Markov model for Real-Time spam tweets filtering

Washha, Mahdi; Qaroush, Aziz; Mezghani, Manel; Sedes, Florence

Please use this identifier to cite or link to this item: http://hdl.handle.net/20.500.11889/8201

DC Field	Value	Language
dc.contributor.author	Washha, Mahdi	en_US
dc.contributor.author	Qaroush, Aziz	en_US
dc.contributor.author	Mezghani, Manel	en_US
dc.contributor.author	Sedes, Florence	en_US
dc.date.accessioned	2023-11-21T09:49:39Z	-
dc.date.available	2023-11-21T09:49:39Z	-
dc.date.issued	2017	-
dc.identifier.uri	http://hdl.handle.net/20.500.11889/8201	-
dc.description.abstract	Online social networks (OSNs) have become an important source of information for a tremendous range of applications and researches such as search engines, and summarization systems. However, the high usability and accessibility of OSNs have exposed many information quality (IQ) problems which consequently decrease the performance of the OSNs dependent applications. Social spammers are a particular kind of ill-intentioned users who degrade the quality of OSNs information through misusing all possible services provided by OSNs. Social spammers spread many intensive posts/tweets to lure legitimate users to malicious or commercial sites containing malware downloads, phishing, and drug sales. Given the fact that Twitter is not immune towards the social spam problem, different researchers have designed various detection methods which inspect individual tweets or accounts for the existence of spam contents. However, although of the high detection rates of the account-based spam detection methods, these methods are not suitable for filtering tweets in the real-time detection because of the need for information from Twitter’s servers. At tweet spam detection level, many light features have been proposed for real-time filtering; however, the existing classification models separately classify a tweet without considering the state of previous handled tweets associated with a topic. Also, these models periodically require retraining using a ground-truth data to make them up-to-date. Hence, in this paper, we formalize a Hidden Markov Model (HMM) as a time-dependent model for real-time topical spam tweets filtering. More precisely, our method only leverages the available and accessible meta-data in the tweet object to detect spam tweets exiting in a stream of tweets related to a topic (e.g., #Trump), with considering the state of previously handled tweets associated to the same topic. Compared to the classical time-independent classification methods such as Random Forest, the experimental evaluation demonstrates the efficiency of increasing the quality of topics in terms of precision, recall, and F-measure performance metrics.	en_US
dc.language.iso	en_US	en_US
dc.subject	Markov processes	en_US
dc.subject	Online social networks -- Security measures	en_US
dc.subject	Spam filtering (Electronic mail)	en_US
dc.subject	Real-Time	en_US
dc.subject	Twitter	en_US
dc.title	A Topic-Based Hidden Markov model for Real-Time spam tweets filtering	en_US
dc.type	Article	en_US
newfileds.department	Engineering and Technology	en_US
newfileds.item-access-type	open_access	en_US
newfileds.thesis-prog	none	en_US
newfileds.general-subject	none	en_US
item.languageiso639-1	other	-
item.fulltext	With Fulltext	-
item.grantfulltext	open	-
Appears in Collections:	Fulltext Publications

Files in This Item:

File	Description	Size	Format
A Topic-Based Hidden Markov Model for Real-Time Spam Tweets Filtering.pdf		607.72 kB	Adobe PDF	View/Open

Show simple item record

Page view(s)

9

checked on Jan 20, 2024

Download(s)

2

checked on Jan 20, 2024

Google Scholar^TM

Check

Files in This Item:

Page view(s)

Download(s)

Google ScholarTM

Google Scholar^TM