NetSpam: a Network-based Spam Detection Framework for Reviews in Online Social Media
Abstract—Nowadays, a big part of people rely on available content in social media in their decisions (e.g. reviews and feedback on a topic or product). The possibility that anybody can leave a review provide a golden opportunity for spammers to write spam reviews about products and services for different interests. Identifying these spammers and the spam content is a hot topic of research and although a considerable number of studies have been done recently toward this end, but so far the methodologies put forth still barely detect spam reviews, and none of them show the importance of each extracted feature type. In this study, we propose a novel framework, named NetSpam, which utilizes spam features for modeling review datasets as heterogeneous information networks to map spam detection procedure into a classification problem in such networks. Using the importance of spam features help us to obtain better results in terms of different metrics experimented on real-world review datasets from Yelp and Amazon websites. The results show that NetSpam outperforms the existing methods and among four categories of features; including review-behavioral, user-behavioral, reviewlinguistic, user-linguistic, the first type of features performs better than the other categories.
In Existing work, the work only depend on the detect the spam reviews and spammers. None of them show the importance of each extracted feature type. On the other hand, a considerable amount of literature has been published on the techniques used to identify spam and spammers as well as different type of analysis on this topic. These techniques can be classified into different categories; some using linguistic patterns in text which are mostly based on bigram, and unigram, others are based on behavioral patterns that rely on features extracted from patterns in users’ behavior which are mostly metadata based.
Disadvantages: • These work not enough to classify the spam network. • Lack of work to detect spam features.
We propose NetSpam framework that is a novel network based approach which models review networks as heterogeneous information networks. The general concept of our proposed framework is to model a given review dataset as a Heterogeneous Information Network (HIN) and to map the problem of spam detection into a HIN classification problem. In particular, we model review dataset as a HIN in which reviews are connected through different node types (such as features and users). A weighting concept is then employed to calculate each feature’s importance (or weight). These weights are utilized to calculate the final labels for reviews using both unsupervised and supervised approaches.
• Importance of spam features help us to obtain better results in terms of different metrics experimented on real-world review datasets
• Initiating the work to detect spam features.
SYSTEM REQUIREMENTS HARDWARE REQUIREMENTS:
Hardware : Pentium Speed : 1.1 GHz RAM : 1GB Hard Disk : 20 GB
Development team :