Outlier Detection from Network Data with Subnetwork Interpretation

Detecting a small number of outliers from a set of data observations is always challenging. This problem is more difficult in the setting of multiple network samples, where computing the anomalous degree of a network sample is generally not sufficient. In fact, explaining why a given network is exceptional, expressed in the form of subnetwork, is also equally important. We develop a novel algorithm to address these two key problems. We treat each network sample as a potential outlier and identify subnetworks that help discriminate it from nearby samples. The algorithm is developed in the framework of network regression combined with the constraints on both network topology and L1-norm shrinkage to perform subnetwork discovery. Our method thus goes beyond subspace/subgraph discovery. We also show that the developed method converges to a global optimum. Empirical evaluation on various real-world network datasets demonstrates the advantages of our algorithm over various baseline methods.

[1]  Yizhou Sun,et al.  Integrating community matching and outlier detection for mining evolutionary community outliers , 2012, KDD.

[2]  Lawrence B. Holder,et al.  Discovering Structural Anomalies in Graph-Based Data , 2007, Seventh IEEE International Conference on Data Mining Workshops (ICDMW 2007).

[3]  Alexander J. Smola,et al.  Learning with Kernels: support vector machines, regularization, optimization, and beyond , 2001, Adaptive computation and machine learning series.

[4]  Arthur Zimek,et al.  Discriminative features for identifying and interpreting outliers , 2014, 2014 IEEE 30th International Conference on Data Engineering.

[5]  Hans-Peter Kriegel,et al.  Outlier Detection in Arbitrarily Oriented Subspaces , 2012, 2012 IEEE 12th International Conference on Data Mining.

[6]  Ira Assent,et al.  Explaining Outliers by Subspace Separability , 2013, 2013 IEEE 13th International Conference on Data Mining.

[7]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[8]  Ambuj K. Singh,et al.  Discriminative Subnetworks with Regularized Spectral Learning for Global-State Network Data , 2014, ECML/PKDD.

[9]  Philip S. Yu,et al.  Outlier detection in graph streams , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[10]  S. Rha,et al.  Whole genome analysis for liver metastasis gene signatures in colorectal cancer , 2007, International journal of cancer.

[11]  Nagiza F. Samatova,et al.  Community-based anomaly detection in evolutionary networks , 2012, Journal of Intelligent Information Systems.

[12]  Klemens Böhm,et al.  HiCS: High Contrast Subspaces for Density-Based Outlier Ranking , 2012, 2012 IEEE 28th International Conference on Data Engineering.

[13]  Ambuj K. Singh,et al.  NetSpot: Spotting Significant Anomalous Regions on Dynamic Networks , 2013, SDM.

[14]  Danai Koutra,et al.  Graph based anomaly detection and description: a survey , 2014, Data Mining and Knowledge Discovery.

[15]  Mark W. Schmidt,et al.  Fast Optimization Methods for L1 Regularization: A Comparative Study and Two New Approaches , 2007, ECML.

[16]  Hans-Peter Kriegel,et al.  LOF: identifying density-based local outliers , 2000, SIGMOD '00.

[17]  Jignesh M. Patel,et al.  Efficient aggregation for graph summarization , 2008, SIGMOD Conference.

[18]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[19]  Fan Chung Graham,et al.  Local Graph Partitioning using PageRank Vectors , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[20]  Ambuj K. Singh,et al.  Mining Heavy Subgraphs in Time-Evolving Networks , 2011, 2011 IEEE 11th International Conference on Data Mining.

[21]  Emmanuel Müller,et al.  Focused clustering and outlier detection in large attributed graphs , 2014, KDD.

[22]  Pascal Frossard,et al.  The emerging field of signal processing on graphs: Extending high-dimensional data analysis to networks and other irregular domains , 2012, IEEE Signal Processing Magazine.

[23]  Christos Faloutsos,et al.  oddball: Spotting Anomalies in Weighted Graphs , 2010, PAKDD.

[24]  Anthony Widjaja,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2003, IEEE Transactions on Neural Networks.

[25]  Paul Barford,et al.  Intrusion as (anti)social communication: characterization and detection , 2012, KDD.

[26]  Gustavo E. A. P. A. Batista,et al.  A study of the behavior of several methods for balancing machine learning training data , 2004, SKDD.

[27]  Christos Faloutsos,et al.  It's who you know: graph mining using recursive structural features , 2011, KDD.

[28]  Yizhou Sun,et al.  On community outliers and their efficient detection in information networks , 2010, KDD.

[29]  Charu C. Aggarwal,et al.  Event Detection in Social Streams , 2012, SDM.

[30]  Leman Akoglu,et al.  Scalable Anomaly Ranking of Attributed Neighborhoods , 2016, SDM.

[31]  Hans-Peter Kriegel,et al.  Angle-based outlier detection in high-dimensional data , 2008, KDD.

[32]  Hans-Peter Kriegel,et al.  A survey on unsupervised outlier detection in high‐dimensional numerical data , 2012, Stat. Anal. Data Min..

[33]  Ashutosh Kumar Singh,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2010 .

[34]  Yixin Chen,et al.  A Reduction of the Elastic Net to Support Vector Machines with an Application to GPU Computing , 2014, AAAI.

[35]  S. Sathiya Keerthi,et al.  Building Support Vector Machines with Reduced Classifier Complexity , 2006, J. Mach. Learn. Res..

[36]  Jimeng Sun,et al.  Neighborhood formation and anomaly detection in bipartite graphs , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[37]  Shiliang Sun,et al.  A review of optimization methodologies in support vector machines , 2011, Neurocomputing.

[38]  Chih-Jen Lin,et al.  A dual coordinate descent method for large-scale linear SVM , 2008, ICML '08.

[39]  Katya Scheinberg,et al.  An Efficient Implementation of an Active Set Method for SVMs , 2006, J. Mach. Learn. Res..

[40]  Ambuj K. Singh,et al.  Learning Predictive Substructures with Regularization for Network Data , 2015, 2015 IEEE International Conference on Data Mining.