Mining concise patterns on graph-connected itemsets

Abstract The itemset is a basic and usual form of data. People can obtain new insights into their business by discovering its implicit regularities through pattern mining. In some real applications, e.g., network alarm association, the itemsets usually have the following two characteristics: (1) the observed samples come from different entities, with inherent structural relationships implied in their static properties; (2) the samples are scarce, which may lead to incomplete pattern extraction. This paper considers how to efficiently find a concise set of patterns on such kind of data. Firstly, we use a graph to express the entities and their interconnections and propagate every sample to every node with a weight, determined by the pre-defined combination of kernel functions based on the similarities of the nodes and patterns. Next, the weight values can be naturally imported into the MDL-based filtering process and bring a differentiated pattern set for each node. Experiments show that the solution can outperform the global solution (trading all nodes as one) and isolated solution (removing all edges) on simulated and real data, and its effectiveness and scalability can be further verified in the application of large-scale network operation and maintenance.

[1]  Nan Huang,et al.  Alarm Correlation Analysis in SDH Network Failure , 2012, ITCS 2012.

[2]  Jeremi K. Ochab,et al.  Maximal entropy random walk in community detection , 2012, The European Physical Journal Special Topics.

[3]  Jilles Vreeken,et al.  Preserving Privacy through Data Generation , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[4]  Xue Wang,et al.  On extending extreme learning machine to non-redundant synergy pattern based graph classification , 2015, Neurocomputing.

[5]  Shuyuan Yang,et al.  Ridgelet kernel regression , 2007, Neurocomputing.

[6]  Francisco Escolano,et al.  Graph matching and clustering using kernel attributes , 2013, Neurocomputing.

[7]  Awad H. Al-Mohy,et al.  A New Scaling and Squaring Algorithm for the Matrix Exponential , 2009, SIAM J. Matrix Anal. Appl..

[8]  Christos Faloutsos,et al.  On data mining, compression, and Kolmogorov complexity , 2007, Data Mining and Knowledge Discovery.

[9]  Jilles Vreeken,et al.  Slim: Directly Mining Descriptive Patterns , 2012, SDM.

[10]  Cleve B. Moler,et al.  Nineteen Dubious Ways to Compute the Exponential of a Matrix, Twenty-Five Years Later , 1978, SIAM Rev..

[11]  Jilles Vreeken,et al.  Filling in the Blanks - Krimp Minimisation for Missing Data , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[12]  K. R. Seeja Feature selection based on closed frequent itemset mining: A case study on SAGE data classification , 2015, Neurocomputing.

[13]  Jean-François Boulicaut,et al.  Local Pattern Detection in Attributed Graphs , 2016, Solving Large Scale Learning Tasks.

[14]  Toon Calders,et al.  Mining Compressing Sequential Patterns , 2014, Stat. Anal. Data Min..

[15]  Jianquan Liu,et al.  Link prediction: the power of maximal entropy random walk , 2011, CIKM '11.

[16]  J. Delvenne,et al.  Centrality measures and thermodynamic formalism for complex networks. , 2007, Physical review. E, Statistical, nonlinear, and soft matter physics.

[17]  Michael R. Berthold,et al.  Widened KRIMP: Better Performance through Diverse Parallelism , 2014, IDA.

[18]  Jilles Vreeken,et al.  Krimp: mining itemsets that compress , 2011, Data Mining and Knowledge Discovery.

[19]  Alexander J. Smola,et al.  Kernel methods and the exponential family , 2006, ESANN.

[20]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD 2000.

[21]  Yijun Liu,et al.  Efficient alarm behavior analytics for telecom networks , 2017, Inf. Sci..

[22]  Kevin P. Murphy,et al.  Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.

[23]  Arno Siebes,et al.  StreamKrimp: Detecting Change in Data Streams , 2008, ECML/PKDD.

[24]  Francesco Dinuzzo,et al.  Learning output kernels for multi-task problems , 2013, Neurocomputing.

[25]  Christos Faloutsos,et al.  Fast and reliable anomaly detection in categorical data , 2012, CIKM.

[26]  J. Gómez-Gardeñes,et al.  Maximal-entropy random walks in complex networks with limited information. , 2010, Physical review. E, Statistical, nonlinear, and soft matter physics.

[27]  Rasool Jalili,et al.  Alert Correlation Algorithms: A Survey and Taxonomy , 2013, CSS.

[28]  Tongwen Chen,et al.  A method for pattern mining in multiple alarm flood sequences , 2017 .

[29]  Mohammed J. Zaki,et al.  Structural correlation pattern mining for large graphs , 2010, MLG '10.

[30]  Heikki Mannila,et al.  Rule Discovery in Telecommunication Alarm Data , 1999, Journal of Network and Systems Management.

[31]  Yoav Freund,et al.  The Alternating Decision Tree Learning Algorithm , 1999, ICML.

[32]  Le Song,et al.  A unified kernel framework for nonparametric inference in graphical models ] Kernel Embeddings of Conditional Distributions , 2013 .

[33]  Jian Pei,et al.  When Social Influence Meets Item Inference , 2015, KDD.

[34]  John D. Lafferty,et al.  Diffusion Kernels on Graphs and Other Discrete Input Spaces , 2002, ICML.

[35]  Ravi Kumar,et al.  Influence and correlation in social networks , 2008, KDD.

[36]  Z. Burda,et al.  Localization of the maximal entropy random walk. , 2008, Physical review letters.

[37]  Matthias Dehmer,et al.  A history of graph entropy measures , 2011, Inf. Sci..

[38]  Jimeng Sun,et al.  StructInf: Mining Structural Influence from Social Streams , 2017, AAAI.

[39]  Michelangelo Ceci,et al.  Relational mining for discovering changes in evolving networks , 2015, Neurocomputing.

[40]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.