Storage cost of spam 2.0 in a web discussion forum

This paper presents an empirical research that identifies cost of Spam 2.0. This experiment is a part of ongoing research for identifying the cost of Spam 2.0 and focuses only on storage cost. The data is collected via a honeypot setup using a discussion forum for a period of 13 months. Forum provides a good place for the spammers to continue their spamming activities. Spamming give both direct and indirect cost towards forum owner and forum users. In this paper, we present a method to measure direct cost focusing only on storage cost. The main observation of the experiment is done towards 450,772 posts, 141 personal messages and 62,798 profiles. It uses 2.69 GB storage space. We first define our cost formula. We then set up a web based discussion forum and collect the information posted on the forum. This data is pre-processed to discover information that can be used in our formula. In order to identify the storage used for spam, we define related attributes based on maximum storage and impact factor features named as spam unit, and measure the storage taken by all these spam units. We evaluate the cost of storage based on three sources which are our real self-hosted server, commercial web hosting package and cloud hosting package. The experiment resulted that the storage cost for our research forum are AUD 23.66 based on self-hosted server, AUD133.90 for commercial web hosting, and AUD11.53 for cloud hosting. The highest storage cost for 10,000 spam posts, profiles and personal messages is AUD2.963, AUD0.068 and AUD0.056.

[1]  Alex Talevski,et al.  Assessing post usage for measuring the quality of forum posts , 2010, 4th IEEE International Conference on Digital Ecosystems and Technologies.

[2]  Ciro Cattuto,et al.  Social spam detection , 2009, AIRWeb '09.

[3]  Catherine Rosenberg,et al.  A game theoretic framework for bandwidth allocation and pricing in broadband networks , 2000, TNET.

[4]  Steven Myers,et al.  The Nuts and Bolts of a Forum Spam Automator , 2011, LEET.

[5]  Brian D. Davison,et al.  Detection of Harassment on Web 2.0 , 2009 .

[6]  Hao Chen,et al.  A Quantitative Study of Forum Spamming Using Context-based Analysis , 2007, NDSS.

[7]  Nazanin Firoozeh,et al.  Definition of spam 2.0: New spamming boom , 2010, 4th IEEE International Conference on Digital Ecosystems and Technologies.

[8]  Chris Kanich,et al.  Re: CAPTCHAs-Understanding CAPTCHA-Solving Services in an Economic Context , 2010, USENIX Security Symposium.

[9]  Alex Talevski,et al.  Web Spambot Detection Based on Web Navigation Behaviour , 2010, 2010 24th IEEE International Conference on Advanced Information Networking and Applications.

[10]  Farida Ridzuan,et al.  Key Parameters in Identifying Cost of Spam 2.0 , 2010, 2010 24th IEEE International Conference on Advanced Information Networking and Applications.

[11]  Alex Talevski,et al.  HoneySpam 2.0: Profiling Web Spambot Behaviour , 2009, PRIMA.

[12]  Steven Myers,et al.  Prevalence and mitigation of forum spamming , 2011, 2011 Proceedings IEEE INFOCOM.

[13]  Farida Ridzuan,et al.  Spam 2.0: The Problem Ahead , 2010, ICCSA.