Leveraging Social Networks for Effective Spam Filtering

The explosive growth of unsolicited e-mails has prompted the development of numerous spam filter techniques. Bayesian spam filters are superior to static keyword-based spam filters in that they can continuously evolve to tackle new spam by learning keywords in new spam emails. However, Bayesian spam filters are easily poisoned by clever spammers who avoid spam keywords and add many innocuous words in their emails. Also, Bayesian spam filters need a significant amount of time to adapt to a new spam based on user feedback. Moreover, few current spam filters exploit social networks to assist in spam detection. In order to develop an accurate and user-friendly spam filter, we propose a SOcial network Aided Personalized and effective spam filter (SOAP) in this paper. In SOAP, each node connects to its social friends; i.e., nodes form a distributed overlay by directly using social network links as overlay links. Each node uses SOAP to collect information and check spam autonomously in a distributed manner. Unlike previous spam filters that focus on parsing keywords (e.g., Bayesian filters) or building blacklists, SOAP exploits the social relationships among email correspondents and their (dis)interests to detect spam adaptively and automatically. In each node, SOAP integrates four components into the basic Bayesian filter: social closeness-based spam filtering, social interest-based spam filtering, adaptive trust management, and friend notification. We have evaluated the performance of SOAP using simulation based on trace data from Facebook. We also have implemented a SOAP prototype for real-world experiments. Experimental results show that SOAP can greatly improve the performance of Bayesian spam filters in terms of accuracy, attack-resilience, and efficiency of spam detection. The performance of the Bayesian spam filter is SOAP's lower bound.

[1]  Ramesh Govindan,et al.  Using hierarchical location names for scalable routing and rendezvous in wireless sensor networks , 2004, SenSys '04.

[2]  Susan T. Dumais,et al.  A Bayesian Approach to Filtering Junk E-Mail , 1998, AAAI 1998.

[3]  Rick Wash,et al.  An economic answer to unsolicited communication , 2004, EC '04.

[4]  Zili Zhang,et al.  An email classification model based on rough set theory , 2005, Proceedings of the 2005 International Conference on Active Media Technology, 2005. (AMT 2005)..

[5]  Kirk Pruhs,et al.  KDDCS: a load-balanced in-network data-centric storage scheme for sensor networks , 2006, CIKM '06.

[6]  Y. Sinai,et al.  Theory of probability and random processes , 2007 .

[7]  Ee-Peng Lim,et al.  In-Network Processing of Nearest Neighbor Queries for Wireless Sensor Networks , 2006, DASFAA.

[8]  Xiaoming Fu,et al.  LENS: Leveraging social networking and trust to prevent spam transmission , 2011, 2011 19th IEEE International Conference on Network Protocols.

[9]  Johan A. K. Suykens,et al.  Least Squares Support Vector Machine Classifiers , 1999, Neural Processing Letters.

[10]  Young-Jin Kim,et al.  Multi-dimensional range queries in sensor networks , 2003, SenSys '03.

[11]  Deborah Estrin,et al.  DIFS: a distributed index for features in sensor networks , 2003, Ad Hoc Networks.

[12]  Steffen Bickel,et al.  Dirichlet-Enhanced Spam Filtering based on Biased Samples , 2006, NIPS.

[13]  Yunhao Liu,et al.  Rendered Path: Range-Free Localization in Anisotropic Sensor Networks With Holes , 2007, IEEE/ACM Transactions on Networking.

[14]  Thomas F. La Porta,et al.  Data dissemination with ring-based index for wireless sensor networks , 2003, 11th IEEE International Conference on Network Protocols, 2003. Proceedings..

[15]  Peter Desnoyers,et al.  TSAR: a two tier sensor storage architecture using interval skip graphs , 2005, SenSys '05.

[16]  David R. Karger,et al.  Consistent hashing and random trees: distributed caching protocols for relieving hot spots on the World Wide Web , 1997, STOC '97.

[17]  Deborah Estrin,et al.  Networking issues in wireless sensor networks , 2003, J. Parallel Distributed Comput..

[18]  James Newsome,et al.  GEM: Graph EMbedding for routing and data-centric storage in sensor networks without geographic information , 2003, SenSys '03.

[19]  Deborah Estrin,et al.  Data-Centric Storage in Sensornets with GHT, a Geographic Hash Table , 2003, Mob. Networks Appl..

[20]  Why Bayesian filtering is the most effective anti-spam technology Achieving a 98%+ spam detection rate using a mathematical approach , 2007 .

[21]  Stefano Chessa,et al.  GPS free coordinate assignment and routing in wireless sensor networks , 2005, Proceedings IEEE 24th Annual Joint Conference of the IEEE Computer and Communications Societies..

[22]  Deborah Estrin,et al.  GPS-less low-cost outdoor localization for very small devices , 2000, IEEE Wirel. Commun..

[23]  Fang Liu,et al.  Location discovery for sensor networks with short range beacons , 2009, Int. J. Ad Hoc Ubiquitous Comput..

[24]  Lluís Màrquez i Villodre,et al.  Boosting Trees for Anti-Spam Email Filtering , 2001, ArXiv.

[25]  Ben Y. Zhao,et al.  Approximate Object Location and Spam Filtering on Peer-to-Peer Systems , 2003, Middleware.

[26]  James A. Hendler,et al.  Reputation Network Analysis for Email Filtering , 2004, CEAS.

[27]  David E. Culler,et al.  Lessons from a Sensor Network Expedition , 2004, EWSN.

[28]  Fausto Giunchiglia,et al.  Data Management for Peer-to-Peer Computing : A Vision , 2002, WebDB.

[29]  Padraig Cunningham,et al.  An Assessment of Case-Based Reasoning for Spam Filtering , 2005, Artificial Intelligence Review.

[30]  Hao Chen,et al.  A Quantitative Study of Forum Spamming Using Context-based Analysis , 2007, NDSS.

[31]  Michael Kaminsky,et al.  SybilGuard: defending against sybil attacks via social networks , 2006, SIGCOMM.

[32]  M. Jackson,et al.  A Strategic Model of Social and Economic Networks , 1996 .

[33]  Jim Kurose,et al.  Computer Networking: A Top-Down Approach , 1999 .

[34]  Juan M. Corchado,et al.  SpamHunting: An instance-based reasoning system for spam labelling and filtering , 2007, Decis. Support Syst..

[35]  Yunhao Liu,et al.  Rendered path: range-free localization in anisotropic sensor networks with holes , 2010, TNET.

[36]  Kevin Borders,et al.  Social networks and context-aware spam , 2008, CSCW.

[37]  Songwu Lu,et al.  GRAdient Broadcast: A Robust Data Delivery Protocol for Large Scale Sensor Networks , 2005, Wirel. Networks.

[38]  Sharon L. Milgram,et al.  The Small World Problem , 1967 .

[39]  Michael Walfish,et al.  Distributed Quota Enforcement for Spam Control , 2006, NSDI.

[40]  Wei Hong,et al.  Proceedings of the 5th Symposium on Operating Systems Design and Implementation Tag: a Tiny Aggregation Service for Ad-hoc Sensor Networks , 2022 .

[41]  David R. Karger,et al.  A scalable location service for geographic ad hoc routing , 2000, MobiCom '00.

[42]  Ben Y. Zhao,et al.  User interactions in social networks and their implications , 2009, EuroSys '09.

[43]  David E. Culler,et al.  Beacon vector routing: scalable point-to-point routing in wireless sensornets , 2005, NSDI.

[44]  Haiyun Luo,et al.  TTDD: Two-tier Data Dissemination in Large-scale Sensor Networks , 2002, MobiCom 2002.

[45]  Deborah Estrin,et al.  Directed diffusion: a scalable and robust communication paradigm for sensor networks , 2000, MobiCom '00.

[46]  Hector J. Levesque,et al.  Knowledge Representation and Reasoning , 2004 .

[47]  Gayatri Swamynathan,et al.  Do social networks improve e-commerce?: a study on social marketplaces , 2008, WOSN '08.

[48]  Dit-Yan Yeung,et al.  A learning approach to spam detection based on social networks , 2007 .

[49]  Gianluca Stringhini,et al.  Detecting spammers on social networks , 2010, ACSAC '10.

[50]  Thomas F. La Porta,et al.  Sensor relocation in mobile sensor networks , 2005, Proceedings IEEE 24th Annual Joint Conference of the IEEE Computer and Communications Societies..

[51]  Deborah Estrin,et al.  Dimensions: why do we need a new data handling architecture for sensor networks? , 2003, CCRV.

[52]  Haiying Shen,et al.  SOAP: A Social network Aided Personalized and effective spam filter to clean your e-mail box , 2011, 2011 Proceedings IEEE INFOCOM.

[53]  F. Heider ATTITUDES AND COGNITIVE ORGANIZATION , 1977 .

[54]  Peter Haider,et al.  Supervised clustering of streaming data for email batch detection , 2007, ICML '07.

[55]  GovindanRamesh,et al.  Data-centric storage in sensornets with GHT, a geographic hash table , 2003 .

[56]  Jianliang Xu,et al.  A New Storage Scheme for Approximate Location Queries in Object-Tracking Sensor Networks , 2008, IEEE Transactions on Parallel and Distributed Systems.

[57]  P. Oscar Boykin,et al.  Personal Email Networks: An Effective Anti-Spam Tool , 2004, ArXiv.

[58]  Thomas Tran,et al.  Social Email: A Framework and Application for More Socially-Aware Communications , 2010, SocInfo.

[59]  P. Oscar Boykin,et al.  Leveraging social networks to fight spam , 2005, Computer.

[60]  Alon Y. Halevy,et al.  Piazza: data management infrastructure for semantic web applications , 2003, WWW '03.

[61]  Harry Wechsler,et al.  Using Social Network Analysis for Spam Detection , 2010, SBP.

[62]  Wolfgang Nejdl,et al.  Routing and clustering in schema-based super peer networks , 2002 .

[63]  David Mazières,et al.  RE: Reliable Email , 2006, NSDI.

[64]  Ting Li,et al.  SDS: Distributed Spatial-Temporal Similarity Data Storage in Wireless Sensor Networks , 2009, 2009 Proceedings of 18th International Conference on Computer Communications and Networks.

[65]  T. Tabata,et al.  Design and Evaluation of a Bayesian-filter-based Image Spam Filtering Method , 2008, 2008 International Conference on Information Security and Assurance (isa 2008).

[66]  David Evans,et al.  Localization for mobile sensor networks , 2004, MobiCom '04.