A survey of data mining and social network analysis based anomaly detection techniques

Abstract With the increasing trend of online social networks in different domains, social network analysis has recently become the center of research. Online Social Networks (OSNs) have fetched the interest of researchers for their analysis of usage as well as detection of abnormal activities. Anomalous activities in social networks represent unusual and illegal activities exhibiting different behaviors than others present in the same structure. This paper discusses different types of anomalies and their novel categorization based on various characteristics. A review of number of techniques for preventing and detecting anomalies along with underlying assumptions and reasons for the presence of such anomalies is covered in this paper. The paper presents a review of number of data mining approaches used to detect anomalies. A special reference is made to the analysis of social network centric anomaly detection techniques which are broadly classified as behavior based, structure based and spectral based. Each one of this classification further incorporates number of techniques which are discussed in the paper. The paper has been concluded with different future directions and areas of research that could be addressed and worked upon.

[1]  Yizhou Sun,et al.  On community outliers and their efficient detection in information networks , 2010, KDD.

[2]  Teuvo Kohonen,et al.  The self-organizing map , 1990, Neurocomputing.

[3]  Chris Hankin,et al.  Discovery of anomalous behaviour in temporal networks , 2015, Soc. Networks.

[4]  M. F. Augusteijn,et al.  Neural network classification and novelty detection , 2002 .

[5]  Wei Xu,et al.  Improving one-class SVM for anomaly detection , 2003, Proceedings of the 2003 International Conference on Machine Learning and Cybernetics (IEEE Cat. No.03EX693).

[6]  Christos Faloutsos,et al.  LOCI: fast outlier detection using the local correlation integral , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[7]  J. Ma,et al.  Time-series novelty detection using one-class support vector machines , 2003, Proceedings of the International Joint Conference on Neural Networks, 2003..

[8]  Slava Kisilevich,et al.  P-DBSCAN: a density based clustering algorithm for exploration and analysis of attractive areas using collections of geo-tagged photos , 2010, COM.Geo '10.

[9]  Carla M. Santos-Pereira,et al.  Using Clustering and Robust Estimators to Detect Outliers in Multivariate Data. , 2005 .

[10]  Chen Wen,et al.  Advertising Effectiveness on Social Network Sites: An Investigation of Tie Strength, Endorser Expertise and Product Type on Consumer Purchase Intention , 2009, ICIS.

[11]  Rajeev Rastogi,et al.  Efficient algorithms for mining outliers from large data sets , 2000, SIGMOD 2000.

[12]  Mark Newman,et al.  Detecting community structure in networks , 2004 .

[13]  Brian S. Butler,et al.  Membership Size, Communication Activity, and Sustainability: A Resource-Based Model of Online Social Structures , 2001, Inf. Syst. Res..

[14]  Sergei Vassilvitskii,et al.  Scalable K-Means++ , 2012, Proc. VLDB Endow..

[15]  Philip S. Yu,et al.  Outlier detection for high dimensional data , 2001, SIGMOD '01.

[16]  Danah Boyd,et al.  Social Network Sites: Definition, History, and Scholarship , 2007, J. Comput. Mediat. Commun..

[17]  Barbara Carminati,et al.  Content-Based Filtering in On-Line Social Networks , 2010, PSDML.

[18]  Raymond T. Ng,et al.  Algorithms for Mining Distance-Based Outliers in Large Datasets , 1998, VLDB.

[19]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[20]  Malik Yousef,et al.  One-Class SVMs for Document Classification , 2002, J. Mach. Learn. Res..

[21]  Jae-Woo Chang,et al.  A new cell-based clustering method for large, high-dimensional data in data mining applications , 2002, SAC '02.

[22]  George H. John Robust Decision Trees: Removing Outliers from Databases , 1995, KDD.

[23]  Chen-Nee Chuah,et al.  Unveiling facebook: a measurement study of social network based applications , 2008, IMC '08.

[24]  Marzena Kryszkiewicz,et al.  TI-DBSCAN: Clustering with DBSCAN by Means of the Triangle Inequality , 2010, RSCTC.

[25]  G. C. Tiao,et al.  A bayesian approach to some outlier problems. , 1968, Biometrika.

[26]  Jiawei Han,et al.  CLARANS: A Method for Clustering Objects for Spatial Data Mining , 2002, IEEE Trans. Knowl. Data Eng..

[27]  T. Brotherton,et al.  Classification and novelty detection using linear models and a class dependent-elliptical basis function neural network , 1998, 1998 IEEE International Joint Conference on Neural Networks Proceedings. IEEE World Congress on Computational Intelligence (Cat. No.98CH36227).

[28]  Richi Nayak,et al.  Analyzing the Effectiveness of Graph Metrics for Anomaly Detection in Online Social Networks , 2012, WISE.

[29]  Zengyou He,et al.  Discovering cluster-based local outliers , 2003, Pattern Recognit. Lett..

[30]  Zhang Yi,et al.  A hierarchical intrusion detection model based on the PCA neural networks , 2007, Neurocomputing.

[31]  David Heckerman,et al.  Bayesian Networks for Data Mining , 2004, Data Mining and Knowledge Discovery.

[32]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[33]  Pasi Fränti,et al.  Outlier detection using k-nearest neighbour graph , 2004, ICPR 2004.

[34]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[35]  Cao Xiao,et al.  Detecting Clusters of Fake Accounts in Online Social Networks , 2015, AISec@CCS.

[36]  G. Box,et al.  Bayesian analysis of some outlier problems in time series , 1979 .

[37]  Nisheeth Shrivastava,et al.  Mining (Social) Network Graphs to Detect Random Link Attacks , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[38]  Steve Harenberg,et al.  Anomaly detection in dynamic networks: a survey , 2015 .

[39]  Pavel Berkhin,et al.  A Survey of Clustering Data Mining Techniques , 2006, Grouping Multidimensional Data.

[40]  Christos Faloutsos,et al.  oddball: Spotting Anomalies in Weighted Graphs , 2010, PAKDD.

[41]  Loris Nanni,et al.  Ensemble of on-line signature matchers based on OverComplete feature generation , 2009, Expert Syst. Appl..

[42]  Sehun Kim,et al.  Two-Phase Malicious Web Page Detection Scheme Using Misuse and Anomaly Detection , 2014 .

[43]  Clara Pizzuti,et al.  Fast Outlier Detection in High Dimensional Spaces , 2002, PKDD.

[44]  Myra Spiliopoulou,et al.  C-DBSCAN: Density-Based Clustering with Constraints , 2009, RSFDGrC.

[45]  Lada A. Adamic,et al.  How to search a social network , 2005, Soc. Networks.

[46]  Sudipto Guha,et al.  CURE: an efficient clustering algorithm for large databases , 1998, SIGMOD '98.

[47]  Vipin Kumar,et al.  Chameleon: Hierarchical Clustering Using Dynamic Modeling , 1999, Computer.

[48]  VARUN CHANDOLA,et al.  Anomaly detection: A survey , 2009, CSUR.

[49]  Vic Barnett,et al.  Outliers in Statistical Data , 1980 .

[50]  Sariinas Ra Ud Ys ON THE EFFECTIVENESS OF PARZEN WINDOW CLASSIFIER , 1991 .

[51]  Xiaowei Ying,et al.  Spectrum based fraud detection in social networks , 2011, ICDE.

[52]  D.M. Mount,et al.  An Efficient k-Means Clustering Algorithm: Analysis and Implementation , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[53]  Serge J. Belongie,et al.  Higher order learning with graphs , 2006, ICML.

[54]  M E J Newman,et al.  Community structure in social and biological networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[55]  Christopher Krügel,et al.  Bayesian event classification for intrusion detection , 2003, 19th Annual Computer Security Applications Conference, 2003. Proceedings..

[56]  Wenke Lee,et al.  McPAD: A multiple classifier system for accurate payload-based anomaly detection , 2009, Comput. Networks.

[57]  Jian Tang,et al.  Enhancing Effectiveness of Outlier Detections for Low Density Patterns , 2002, PAKDD.

[58]  Nagiza F. Samatova,et al.  Community-based anomaly detection in evolutionary networks , 2012, Journal of Intelligent Information Systems.

[59]  Zhou Shui FDBSCAN: A Fast DBSCAN Algorithm , 2000 .

[60]  Hans-Peter Kriegel,et al.  LoOP: local outlier probabilities , 2009, CIKM.

[61]  David D. Jensen,et al.  The case for anomalous link discovery , 2005, SKDD.

[62]  Lise Getoor,et al.  Using Friendship Ties and Family Circles for Link Prediction , 2008, SNAKDD.

[63]  Sajid Yousuf Bhat,et al.  Using communities against deception in online social networks , 2014 .

[64]  Anja Feldmann,et al.  Understanding online social network usage from a network perspective , 2009, IMC '09.

[65]  Salvatore J. Stolfo,et al.  A Geometric Framework for Unsupervised Anomaly Detection , 2002, Applications of Data Mining in Computer Security.

[66]  Šarūnas Raudys On the effectiveness of Parzen window classifier , 1991 .

[67]  Gisung Kim,et al.  A novel hybrid intrusion detection method integrating anomaly detection with misuse detection , 2014, Expert Syst. Appl..

[68]  Lisa Singh,et al.  Pruning social networks using structural properties and descriptive attributes , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[69]  Danai Koutra,et al.  Graph based anomaly detection and description: a survey , 2014, Data Mining and Knowledge Discovery.

[70]  Hans-Peter Kriegel,et al.  LOF: identifying density-based local outliers , 2000, SIGMOD 2000.

[71]  Venkatesan Guruswami,et al.  CopyCatch: stopping group attacks by spotting lockstep behavior in social networks , 2013, WWW.

[72]  F. E. Grubbs Procedures for Detecting Outlying Observations in Samples , 1969 .

[73]  Guofei Gu,et al.  HoneyStat: Local Worm Detection Using Honeypots , 2004, RAID.

[74]  Raymond T. Ng,et al.  Distance-based outliers: algorithms and applications , 2000, The VLDB Journal.

[75]  Yannis Manolopoulos,et al.  C2P: Clustering based on Closest Pairs , 2001, VLDB.

[76]  Lise Getoor,et al.  Link mining: a survey , 2005, SKDD.

[77]  Haining Wang,et al.  Detecting Social Spam Campaigns on Twitter , 2012, ACNS.

[78]  Junshui Ma,et al.  Online novelty detection on temporal sequences , 2003, KDD '03.

[79]  Deepak S. Turaga,et al.  A Multi-graph Spectral Framework for Mining Multi-source Anomalies , 2013 .

[80]  Leonid Portnoy,et al.  Intrusion detection with unlabeled data using clustering , 2000 .

[81]  Vipin Kumar,et al.  Finding Clusters of Different Sizes, Shapes, and Densities in Noisy, High Dimensional Data , 2003, SDM.

[82]  Lawrence B. Holder,et al.  Anomaly detection in data represented as graphs , 2007, Intell. Data Anal..

[83]  Gunnar Rätsch,et al.  Constructing Boosting Algorithms from SVMs: An Application to One-Class Classification , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[84]  Christos Faloutsos,et al.  It's who you know: graph mining using recursive structural features , 2011, KDD.

[85]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[86]  Florian Probst,et al.  Identifying Key Users in Online Social Networks: A PageRank Based Approach , 2010, ICIS.

[87]  Steve Harenberg,et al.  Community detection in large‐scale networks: a survey and empirical evaluation , 2014 .

[88]  Nong Ye,et al.  A Markov Chain Model of Temporal Behavior for Anomaly Detection , 2000 .

[89]  Gian Luca Foresti,et al.  Trajectory-Based Anomalous Event Detection , 2008, IEEE Transactions on Circuits and Systems for Video Technology.

[90]  David Liben-Nowell,et al.  The link-prediction problem for social networks , 2007 .

[91]  Gregory Z. Grudic,et al.  Unsupervised Outlier Detection and Semi-Supervised Learning ; CU-CS-976-04 , 2004 .

[92]  Peter J. Rousseeuw,et al.  Clustering Large Applications (Program CLARA) , 2008 .

[93]  Jiangtao Cui,et al.  Social Influence Study in Online Networks: A Three-Level Review , 2015, Journal of Computer Science and Technology.

[94]  Chih-Fong Tsai,et al.  CANN: An intrusion detection system based on combining cluster centers and nearest neighbors , 2015, Knowl. Based Syst..

[95]  Anthony K. H. Tung,et al.  Ranking Outliers Using Symmetric Neighborhood Relationship , 2006, PAKDD.

[96]  Jian Pei,et al.  Data Mining: Concepts and Techniques, 3rd edition , 2006 .

[97]  M. M. Moya,et al.  One-class classifier networks for target recognition applications , 1993 .

[98]  Krishna P. Gummadi,et al.  Towards Detecting Anomalous User Behavior in Online Social Networks , 2014, USENIX Security Symposium.

[99]  Xiuzhen Zhang,et al.  Anomaly detection in online social networks , 2014, Soc. Networks.