A Survey on Social Media Anomaly Detection

Social media anomaly detection is of critical importance to prevent malicious activities such as bullying, terrorist attack planning, and fraud information dissemination. With the recent popularity of social media, new types of anomalous behaviors arise, causing concerns from various parties. While a large amount of work have been dedicated to traditional anomaly detection problems, we observe a surge of research interests in the new realm of social media anomaly detection. In this paper, we present a survey on existing approaches to address this problem. We focus on the new type of anomalous phenomena in the social media and review the recent developed techniques to detect those special types of anomalies. We provide a general overview of the problem domain, common formulations, existing methodologies and potential directions. With this work, we hope to call out the attention from the research community on this challenging problem and open up new directions that we can contribute in the future

[1]  Sanjay Chawla,et al.  On local spatial outliers , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[2]  Shou-De Lin,et al.  Unsupervised link discovery in multi-relational data via rarity analysis , 2003, Third IEEE International Conference on Data Mining.

[3]  Rebecca Willett,et al.  Detection of anomalous meetings in a social network , 2008, 2008 42nd Annual Conference on Information Sciences and Systems.

[4]  Jennifer Rexford,et al.  Sensitivity of PCA for traffic anomaly detection , 2007, SIGMETRICS '07.

[5]  Philip S. Yu,et al.  Colibri: fast mining of large static and dynamic graphs , 2008, KDD.

[6]  Le Song,et al.  Estimating time-varying networks , 2008, ISMB 2008.

[7]  Eamonn J. Keogh,et al.  HOT SAX: efficiently finding the most unusual time series subsequence , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[8]  C. Bilgin Dynamic Network Evolution : Models , Clustering , Anomaly Detection , 2009 .

[9]  Anup K. Ghosh,et al.  A Study in Using Neural Networks for Anomaly and Misuse Detection , 1999, USENIX Security Symposium.

[10]  Jaideep Srivastava,et al.  Event detection from time series data , 1999, KDD '99.

[11]  D. Hand,et al.  Bayesian anomaly detection methods for social networks , 2010, 1011.1788.

[12]  Pang-Ning Tan,et al.  Detection and Characterization of Anomalies in Multivariate Time Series , 2009, SDM.

[13]  Andrew W. Moore,et al.  Bayesian Network Anomaly Pattern Detection for Disease Outbreaks , 2003, ICML.

[14]  A. Madansky Identification of Outliers , 1988 .

[15]  Jaideep Srivastava,et al.  A Comparative Study of Anomaly Detection Schemes in Network Intrusion Detection , 2003, SDM.

[16]  William DuMouchel,et al.  Computer Intrusion Detection Based on Bayes Factors for Comparing Command Transition Probabilities , 1999 .

[17]  VARUN CHANDOLA,et al.  Outlier Detection : A Survey , 2007 .

[18]  Jeffrey Xu Yu,et al.  Spotting Significant Changing Subgraphs in Evolving Graphs , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[19]  Chao-Hsien Chu,et al.  A Review of Data Mining-Based Financial Fraud Detection Research , 2007, 2007 International Conference on Wireless Communications, Networking and Mobile Computing.

[20]  Barnabás Póczos,et al.  Group Anomaly Detection using Flexible Genre Models , 2011, NIPS.

[21]  Catherine Garbay,et al.  Mining Heterogeneous Multivariate Time-Series for Learning Meaningful Patterns: Application to Home Health Telecare , 2004, ArXiv.

[22]  Christos Faloutsos,et al.  Automatic multimedia cross-modal correlation discovery , 2004, KDD.

[23]  Philip S. Yu,et al.  Proximity Tracking on Time-Evolving Bipartite Graphs , 2008, SDM.

[24]  Rebecca Willett,et al.  Hypergraph-Based Anomaly Detection of High-Dimensional Co-Occurrences , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Eleazar Eskin,et al.  A GEOMETRIC FRAMEWORK FOR UNSUPERVISED ANOMALY DETECTION: DETECTING INTRUSIONS IN UNLABELED DATA , 2002 .

[26]  VARUN CHANDOLA,et al.  Anomaly detection: A survey , 2009, CSUR.

[27]  Eamonn J. Keogh,et al.  Disk aware discord discovery: finding unusual time series in terabyte sized datasets , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[28]  Christos Faloutsos,et al.  Weighted graphs and disconnected components: patterns and a generator , 2008, KDD.

[29]  Yehuda Vardi,et al.  A Hybrid High-Order Markov Chain Model for Computer Intrusion Detection , 2001 .

[30]  Sanjay Chawla,et al.  A Causal Approach for Mining Interesting Anomalies , 2013, Canadian Conference on AI.

[31]  Lawrence B. Holder,et al.  Discovering Structural Anomalies in Graph-Based Data , 2007, Seventh IEEE International Conference on Data Mining Workshops (ICDMW 2007).

[32]  John D. Lafferty,et al.  Dynamic topic models , 2006, ICML.

[33]  Philip S. Yu,et al.  GraphScope: parameter-free mining of large time-evolving graphs , 2007, KDD '07.

[34]  Diane J. Cook,et al.  Graph-based anomaly detection , 2003, KDD '03.

[35]  Pang-Ning Tan,et al.  Outrank: a Graph-Based Outlier Detection Framework Using Random Walk , 2008, Int. J. Artif. Intell. Tools.

[36]  Eric P. Xing,et al.  Discrete Temporal Models of Social Networks , 2006, SNA@ICML.

[37]  Theodoros Lappas,et al.  Finding a team of experts in social networks , 2009, KDD.

[38]  Christophe Diot,et al.  Diagnosing network-wide traffic anomalies , 2004, SIGCOMM.

[39]  Christos Faloutsos,et al.  Fast and reliable anomaly detection in categorical data , 2012, CIKM.

[40]  Deepayan Chakrabarti,et al.  AutoPart: Parameter-Free Graph Partitioning and Outlier Detection , 2004, PKDD.

[41]  Lawrence B. Holder,et al.  Graph-Based Data Mining , 2000, IEEE Intell. Syst..

[42]  Hisashi Kashima,et al.  Eigenspace-based anomaly detection in computer systems , 2004, KDD.

[43]  J. Lafferty,et al.  Mixed-membership models of scientific publications , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[44]  Xiaowei Xu,et al.  SCAN: a structural clustering algorithm for networks , 2007, KDD '07.

[45]  Bernhard Schölkopf,et al.  One-Class Support Measure Machines for Group Anomaly Detection , 2013, UAI.

[46]  Matthias Schonlau,et al.  Detecting masquerades in intrusion detection based on unpopular commands , 2000, Inf. Process. Lett..

[47]  Padhraic Smyth,et al.  Adaptive event detection with time-varying poisson processes , 2006, KDD '06.

[48]  R. Willett,et al.  Hypergraph-Based Anomaly Detection in Very Large Networks , 2008 .

[49]  Rebecca Willett,et al.  Online anomaly detection with expert system feedback in social networks , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[50]  David J. Marchette,et al.  Scan Statistics on Enron Graphs , 2005, Comput. Math. Organ. Theory.

[51]  Jiawei Han,et al.  CHRONICLE: A Two-Stage Density-Based Clustering Algorithm for Dynamic Networks , 2009, Discovery Science.

[52]  Jimeng Sun,et al.  Neighborhood formation and anomaly detection in bipartite graphs , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[53]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[54]  Christos Faloutsos,et al.  MultiAspectForensics: Pattern Mining on Large-Scale Heterogeneous Networks with Tensor Analysis , 2011, 2011 International Conference on Advances in Social Networks Analysis and Mining.

[55]  David D. Jensen,et al.  Finding tribes: identifying close-knit individuals from employment patterns , 2007, KDD '07.

[56]  Christos Faloutsos,et al.  oddball: Spotting Anomalies in Weighted Graphs , 2010, PAKDD.

[57]  C. Faloutsos,et al.  Anomaly Detection in Large Graphs , 2020 .

[58]  Sanjay Chawla,et al.  SLOM: a new measure for local spatial outliers , 2006, Knowledge and Information Systems.

[59]  Jeff G. Schneider,et al.  Anomaly pattern detection in categorical datasets , 2008, KDD.

[60]  Thomas L. Griffiths,et al.  The Author-Topic Model for Authors and Documents , 2004, UAI.

[61]  Eamonn J. Keogh,et al.  A symbolic representation of time series, with implications for streaming algorithms , 2003, DMKD '03.

[62]  David D. Jensen,et al.  The case for anomalous link discovery , 2005, SKDD.

[63]  A. Karr,et al.  Computer Intrusion: Detecting Masquerades , 2001 .

[64]  Salvatore J. Stolfo,et al.  Toward Scalable Learning with Non-Uniform Class and Cost Distributions: A Case Study in Credit Card Fraud Detection , 1998, KDD.

[65]  Christos Faloutsos,et al.  Graph mining: Laws, generators, and algorithms , 2006, CSUR.

[66]  Rose Yu,et al.  GLAD: group anomaly detection in social media analysis , 2014, ACM Trans. Knowl. Discov. Data.

[67]  Vipin Kumar,et al.  Anomaly Detection for Discrete Sequences: A Survey , 2012, IEEE Transactions on Knowledge and Data Engineering.

[68]  Christos Faloutsos,et al.  Fast Random Walk with Restart and Its Applications , 2006, Sixth International Conference on Data Mining (ICDM'06).

[69]  Barnabás Póczos,et al.  Hierarchical Probabilistic Models for Group Anomaly Detection , 2011, AISTATS.

[70]  Brandon Pincombea,et al.  Anomaly Detection in Time Series of Graphs using ARMA Processes , 2007 .

[71]  Boris N. Oreshkin,et al.  Machine learning approaches to network anomaly detection , 2007 .

[72]  Kenji Yamanishi,et al.  A unifying framework for detecting outliers and change points from time series , 2006, IEEE Transactions on Knowledge and Data Engineering.

[73]  Sanjay Chawla,et al.  Mining for Outliers in Sequential Databases , 2006, SDM.

[74]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[75]  Yan Liu,et al.  GLAD: group anomaly detection in social media analysis , 2014, KDD.

[76]  Ching-Yung Lin,et al.  Modeling and predicting personal information dissemination behavior , 2005, KDD '05.