An approach for detecting spam in arabic opinion reviews

For the rapidly increasing amount of information available on the Internet, little quality control exists, especially over the user-generated content. Manually scanning through large amounts of user-generated content is time-consuming and sometime impossible. In this case, opinion mining is a better alternative. Although, it is recognized that the opinion reviews contain valuable information for a variety of applications, the lack of quality control attracts spammers who have found many ways to draw their benefits from spamming. Moreover, the spam detection problem is complex because spammers always invent fresh methods that can't be easily recognized. Therefore, there is a need to develop a new approach that works to identify spam in opinion reviews. We have some in English; we need one in Arabic language in order to identify Arabic spam reviews. To the best of our knowledge, there is still no published study to detect spam in Arabic reviews. In this research, we propose a new approach for performing spam detection in Arabic opinion reviews by merging methods from data mining and text mining in one mining classification approach. Our work is based on the state-of-the-art achievements in the Latin-based spam detection techniques keeping in mind the specific nature of the Arabic language. In addition; we overcome the drawbacks of the class imbalance problem by using sampling techniques. The experimental results show that the proposed approach is effective in identifying Arabic spam opinion reviews. Our designed machine learning achieves significant improvements. In the best case, our F-measure is improved to 99.59%.

[1]  Ji Hyea Han,et al.  Data Mining : Concepts and Techniques 2 nd Edition Solution Manual , 2005 .

[2]  Yu Wang,et al.  A method for sorting out the spam from Chinese product reviews , 2012, 2012 2nd International Conference on Consumer Electronics, Communications and Networks (CECNet).

[3]  Derek Greene,et al.  Distortion as a validation criterion in the identification of suspicious reviews , 2010, SOMA '10.

[4]  Yi Yang,et al.  Learning to Identify Review Spam , 2011, IJCAI.

[5]  Julie Beth Lovins,et al.  Development of a stemming algorithm , 1968, Mech. Transl. Comput. Linguistics.

[6]  Bing Liu,et al.  Review spam detection , 2007, WWW '07.

[7]  Bing Liu,et al.  Opinion spam and analysis , 2008, WSDM '08.

[8]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[9]  Michael J. A. Berry,et al.  Data mining techniques - for marketing, sales, and customer support , 1997, Wiley computer publishing.

[10]  Yang Wang,et al.  Cost-sensitive boosting for classification of imbalanced data , 2007, Pattern Recognit..

[11]  Ingo Mierswa,et al.  YALE: rapid prototyping for complex data mining tasks , 2006, KDD '06.

[12]  Taeho Jo,et al.  A Multiple Resampling Method for Learning from Imbalanced Data Sets , 2004, Comput. Intell..

[13]  Alaa M. El-Halees,et al.  Arabic Opinion Mining Using Combined Classification Approach , 2011 .

[14]  Son Lam Phung,et al.  Learning Pattern Classification Tasks with Imbalanced Data Sets , 2009 .

[15]  Ee-Peng Lim,et al.  Detecting product review spammers using rating behaviors , 2010, CIKM.