MultiAspectForensics: mining large heterogeneous networks using tensor

Modern applications such as web knowledge bases, network traffic monitoring and online social networks involve an unprecedented amount of 'heterogeneous' network data, with rich types of interactions among nodes. How can we find patterns and anomalies for heterogeneous networks with millions of edges that have high dimensional attributes, in a scalable way? We introduce MultiAspectForensics, a novel tool to automatically detect and visualise bursts of specific sub-graph patterns within a local community of nodes as anomalies in a heterogeneous network, leveraging scalable tensor analysis methods. One such pattern consists of a set of vertices that form a dense bipartite graph, whose edges share exactly the same set of attributes. We present empirical results of the proposed method on three datasets from distinct application domains, and discuss insights derived from these patterns discovered. Moreover, we empirically show that our algorithm can be feasibly applied to higher dimensional datasets.

[1]  Lawrence B. Holder,et al.  Discovering Structural Anomalies in Graph-Based Data , 2007 .

[2]  Tim Berners-Lee,et al.  Linked Data - The Story So Far , 2009, Int. J. Semantic Web Inf. Syst..

[3]  Raymond T. Ng,et al.  Distance-based outliers: algorithms and applications , 2000, The VLDB Journal.

[4]  Tamara G. Kolda,et al.  Scalable Tensor Factorizations with Missing Data , 2010, SDM.

[5]  Jason Lee,et al.  A first look at modern enterprise traffic , 2005, IMC '05.

[6]  Rasmus Bro,et al.  A comparison of algorithms for fitting the PARAFAC model , 2006, Comput. Stat. Data Anal..

[7]  Charalampos E. Tsourakakis MACH: Fast Randomized Tensor Decompositions , 2009, SDM.

[8]  Philip S. Yu,et al.  Outlier detection for high dimensional data , 2001, SIGMOD '01.

[9]  P. Lawrence Drosophila Unfolded. (Book Reviews: The Making of a Fly. The Genetics of Animal Design.) , 1992 .

[10]  Danah Boyd,et al.  Social Network Sites: Definition, History, and Scholarship , 2007, J. Comput. Mediat. Commun..

[11]  J. Chang,et al.  Analysis of individual differences in multidimensional scaling via an n-way generalization of “Eckart-Young” decomposition , 1970 .

[12]  Christos Faloutsos,et al.  oddball: Spotting Anomalies in Weighted Graphs , 2010, PAKDD.

[13]  Shizuhiko Nishisato,et al.  Elements of Dual Scaling: An Introduction To Practical Data Analysis , 1993 .

[14]  Deepayan Chakrabarti,et al.  AutoPart: Parameter-Free Graph Partitioning and Outlier Detection , 2004, PKDD.

[15]  Estevam R. Hruschka,et al.  Coupled semi-supervised learning for information extraction , 2010, WSDM '10.

[16]  Estevam R. Hruschka,et al.  Toward an Architecture for Never-Ending Language Learning , 2010, AAAI.

[17]  Philip S. Yu,et al.  Incremental tensor analysis: Theory and applications , 2008, TKDD.

[18]  Philip S. Yu,et al.  Colibri: fast mining of large static and dynamic graphs , 2008, KDD.

[19]  Tamara G. Kolda,et al.  Tensor Decompositions and Applications , 2009, SIAM Rev..

[20]  Anthony K. H. Tung,et al.  Mining top-n local outliers in large databases , 2001, KDD '01.

[21]  Lawrence B. Holder,et al.  Substructure Discovery Using Minimum Description Length and Background Knowledge , 1993, J. Artif. Intell. Res..

[22]  Jiawei Han,et al.  Ranking-based classification of heterogeneous information networks , 2011, KDD.

[23]  Jens Lehmann,et al.  DBpedia - A crystallization point for the Web of Data , 2009, J. Web Semant..

[24]  Hannu Toivonen,et al.  Discovery of frequent DATALOG patterns , 1999, Data Mining and Knowledge Discovery.

[25]  Yizhou Sun,et al.  Ranking-based clustering of heterogeneous information networks with star network schema , 2009, KDD.

[26]  Consolación Gil,et al.  Design of a Snort-Based Hybrid Intrusion Detection System , 2009, IWANN.

[27]  George Karypis,et al.  An efficient algorithm for discovering frequent subgraphs , 2004, IEEE Transactions on Knowledge and Data Engineering.

[28]  Jimeng Sun,et al.  Neighborhood formation and anomaly detection in bipartite graphs , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[29]  L. Stein,et al.  OWL Web Ontology Language - Reference , 2004 .

[30]  Christos Faloutsos,et al.  MultiAspectForensics: Pattern Mining on Large-Scale Heterogeneous Networks with Tensor Analysis , 2011, 2011 International Conference on Advances in Social Networks Analysis and Mining.

[31]  Mohammad Al Hasan,et al.  SPARCL: Efficient and Effective Shape-Based Clustering , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[32]  Jason Lee,et al.  The devil and packet trace anonymization , 2006, CCRV.

[33]  Shengcai Liao,et al.  Flickr group recommendation based on tensor decomposition , 2010, SIGIR.

[34]  Steffen Staab,et al.  TripleRank: Ranking Semantic Web Data by Tensor Decomposition , 2009, SEMWEB.

[35]  Diane J. Cook,et al.  Graph-based anomaly detection , 2003, KDD '03.

[36]  Rasmus Bro,et al.  The N-way Toolbox for MATLAB , 2000 .

[37]  G. Rubin,et al.  Global analysis of patterns of gene expression during Drosophila embryogenesis , 2007, Genome Biology.

[38]  Jiawei Han,et al.  gSpan: graph-based substructure pattern mining , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[39]  Tamir Hazan,et al.  Non-negative tensor factorization with applications to statistics and computer vision , 2005, ICML.

[40]  Richard A. Harshman,et al.  Foundations of the PARAFAC procedure: Models and conditions for an "explanatory" multi-model factor analysis , 1970 .

[41]  M. Ashburner,et al.  Systematic determination of patterns of gene expression during Drosophila embryogenesis , 2002, Genome Biology.

[42]  Johan Håstad Tensor Rank is NP-Complete , 1990, J. Algorithms.

[43]  VARUN CHANDOLA,et al.  Anomaly detection: A survey , 2009, CSUR.

[44]  Vic Barnett,et al.  Outliers in Statistical Data , 1980 .

[45]  Hans-Peter Kriegel,et al.  LOF: identifying density-based local outliers , 2000, SIGMOD 2000.