MOSubdue: a Pareto dominance-based multiobjective Subdue algorithm for frequent subgraph mining

Graph-based data mining approaches have been mainly proposed to the task popularly known as frequent subgraph mining subject to a single user preference, like frequency, size, etc. In this work, we propose to deal with the frequent subgraph mining problem from multiobjective optimization viewpoint, where a subgraph (or solution) is defined by several user-defined preferences (or objectives), which are conflicting in nature. For example, mined subgraphs with high frequency are often of small size, and vice-versa. Use of such objectives in the multiobjective subgraph mining process generates Pareto-optimal subgraphs, where no subgraph is better than another subgraph in all objectives. We have applied a Pareto dominance approach for the evaluation and search subgraphs regarding to both proximity and diversity in multiobjective sense, which has incorporated in the framework of Subdue algorithm for subgraph mining. The method is called multiobjective subgraph mining by Subdue (MOSubdue) and has several advantages: (i) generation of Pareto-optimal subgraphs in a single run (ii) selection of subgraph-seeds from the candidate subgraphs based on all objectives (iii) search in the multiobjective subgraphs lattice space, and (iv) capability to deal with different multiobjective frequent subgraph mining tasks by customizing the tackled objectives. The good performance of MOSubdue is shown by performing multiobjective subgraph mining defined by two and three objectives on two real-life datasets.

[1]  Bernhard Sendhoff,et al.  Pareto-Based Multiobjective Machine Learning: An Overview and Case Studies , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[2]  Lothar Thiele,et al.  Comparison of Multiobjective Evolutionary Algorithms: Empirical Results , 2000, Evolutionary Computation.

[3]  Zaida Chinchilla-Rodríguez,et al.  A new technique for building maps of large scientific domains based on the cocitation of classes and categories , 2004, Scientometrics.

[4]  Oscar Cordón,et al.  Graph-based data mining: A new tool for the analysis and comparison of scientific domains represented as scientograms , 2010, J. Informetrics.

[5]  Derek Greene,et al.  Partitioning large networks without breaking communities , 2010, Knowledge and Information Systems.

[6]  Jiawei Han,et al.  Mining coherent dense subgraphs across massive biological networks for functional discovery , 2005, ISMB.

[7]  Peter J. Fleming,et al.  Genetic Algorithms for Multiobjective Optimization: FormulationDiscussion and Generalization , 1993, ICGA.

[8]  Kalyanmoy Deb,et al.  A fast and elitist multiobjective genetic algorithm: NSGA-II , 2002, IEEE Trans. Evol. Comput..

[9]  Peter J. Fleming,et al.  On the Evolutionary Optimization of Many Conflicting Objectives , 2007, IEEE Transactions on Evolutionary Computation.

[10]  Christian Borgelt,et al.  Mining molecular fragments: finding relevant substructures of molecules , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[11]  Yaochu Jin,et al.  Pareto-based Multi-Objective Machine Learning , 2007, 7th International Conference on Hybrid Intelligent Systems (HIS 2007).

[12]  Lawrence B. Holder,et al.  Structural Knowledge Discovery Used to Analyze Earthquake Activity , 2000, FLAIRS Conference.

[13]  Takashi Washio,et al.  An Apriori-Based Algorithm for Mining Frequent Substructures from Graph Data , 2000, PKDD.

[14]  Wei Wang,et al.  Efficient mining of frequent subgraphs in the presence of isomorphism , 2003, Third IEEE International Conference on Data Mining.

[15]  Oscar Cordón,et al.  A Multiobjective Evolutionary Conceptual Clustering Methodology for Gene Annotation Within Structural Databases: A Case of Study on the Gene Ontology Database , 2008, IEEE Transactions on Evolutionary Computation.

[16]  F. Clarke On _{_{*}()}(_{*}(), _{*}()) , 1979 .

[17]  Lawrence B. Holder,et al.  Scalable Discovery of Informative Structural Concepts Using Domain Knowledge , 1996, IEEE Expert.

[18]  Yaochu Jin,et al.  Multi-Objective Machine Learning , 2006, Studies in Computational Intelligence.

[19]  D. W. Dearholt,et al.  Properties of pathfinder networks , 1990 .

[20]  Wei Peng,et al.  Temporal relation co-clustering on directional social network and author-topic evolution , 2011, Knowledge and Information Systems.

[21]  Jaideep Srivastava,et al.  Simultaneously Finding Fundamental Articles and New Topics Using a Community Tracking Method , 2009, PAKDD.

[22]  J. Rissanen Stochastic Complexity in Statistical Inquiry Theory , 1989 .

[23]  Lothar Thiele,et al.  Multiobjective evolutionary algorithms: a comparative case study and the strength Pareto approach , 1999, IEEE Trans. Evol. Comput..

[24]  Jiawei Han,et al.  CloseGraph: mining closed frequent graph patterns , 2003, KDD '03.

[25]  C. Fonseca,et al.  GENETIC ALGORITHMS FOR MULTI-OBJECTIVE OPTIMIZATION: FORMULATION, DISCUSSION, AND GENERALIZATION , 1993 .

[26]  Julio Ortega,et al.  A New Pareto-Based Algorithm for Multi-objective Graph Partitioning , 2004, ISCIS.

[27]  T. Gal,et al.  Multicriteria Decision Making: Advances in MCDM Models, Algorithms, Theory, and Applications , 2012 .

[28]  Christos Faloutsos,et al.  PEGASUS: mining peta-scale graphs , 2011, Knowledge and Information Systems.

[29]  Roger W. Schvaneveldt,et al.  Pathfinder associative networks: studies in knowledge organization , 1990 .

[30]  Marco Laumanns,et al.  Performance assessment of multiobjective optimizers: an analysis and review , 2003, IEEE Trans. Evol. Comput..

[31]  Bruce T. Lowerre,et al.  The HARPY speech recognition system , 1976 .

[32]  Lawrence B. Holder,et al.  Learning Node Replacement Graph Grammars in Metabolic Pathways , 2007, BIOCOMP.

[33]  Yaochu Jin,et al.  Multi-Objective Machine Learning (Studies in Computational Intelligence) (Studies in Computational Intelligence) , 2006 .

[34]  Joost N. Kok,et al.  Frequent subgraph miners: runtimes don't say everything , 2006 .

[35]  Félix de Moya Anegón,et al.  Visualizing the structure of science , 2007 .

[36]  Jiawei Han,et al.  gSpan: graph-based substructure pattern mining , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[37]  Hisao Ishibuchi,et al.  Evolutionary many-objective optimization: A short review , 2008, 2008 IEEE Congress on Evolutionary Computation (IEEE World Congress on Computational Intelligence).

[38]  DebK.,et al.  A fast and elitist multiobjective genetic algorithm , 2002 .

[39]  Laura Maruster,et al.  Encyclopedia of data warehousing and mining , 2008 .

[40]  Charalampos E. Tsourakakis Counting triangles in real-world networks using projections , 2011, Knowledge and Information Systems.

[41]  Philip S. Yu,et al.  A general framework for relation graph clustering , 2010, Knowledge and Information Systems.

[42]  Wei Liu,et al.  An on-line expert system-based fault-tolerant control system , 1996 .

[43]  Tadashi Horiuchi,et al.  Extension of Graph-Based Induction for General Graph Structured Data , 2000, PAKDD.

[44]  Jean-Raymond Abrial,et al.  On B , 1998, B.

[45]  Diane J. Cook,et al.  Graph-based anomaly detection , 2003, KDD '03.

[46]  Nisheeth Shrivastava,et al.  Mining (Social) Network Graphs to Detect Random Link Attacks , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[47]  Xindong Wu,et al.  10 Challenging Problems in Data Mining Research , 2006, Int. J. Inf. Technol. Decis. Mak..

[48]  Yacov Y. Haimes,et al.  Multiobjective Decision Making: Theory and Methodology , 1983 .

[49]  Ambuj K. Singh,et al.  GraphSig: A Scalable Approach to Mining Significant Subgraphs in Large Graph Databases , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[50]  Joost N. Kok,et al.  A quickstart in frequent structure mining can make a difference , 2004, KDD.

[51]  Oscar Cordón,et al.  A multiobjective variant of the Subdue graph mining algorithm based on the NSGA-II selection mechanism , 2010, IEEE Congress on Evolutionary Computation.

[52]  Gary B. Lamont,et al.  Evolutionary Algorithms for Solving Multi-Objective Problems (Genetic and Evolutionary Computation) , 2006 .

[53]  John D. Lafferty,et al.  Diffusion Kernels on Graphs and Other Discrete Input Spaces , 2002, ICML.

[54]  George Karypis,et al.  An efficient algorithm for discovering frequent subgraphs , 2004, IEEE Transactions on Knowledge and Data Engineering.

[55]  R. Reddy,et al.  The Harpy Speech Recognition System: performance with large vocabularies , 1976 .

[56]  Philip S. Yu,et al.  gPrune: A Constraint Pushing Framework for Graph Pattern Mining , 2007, PAKDD.

[57]  Gary B. Lamont,et al.  Evolutionary Algorithms for Solving Multi-Objective Problems , 2002, Genetic Algorithms and Evolutionary Computation.

[58]  Thorsten Meinl,et al.  Graph based molecular data mining - an overview , 2004, 2004 IEEE International Conference on Systems, Man and Cybernetics (IEEE Cat. No.04CH37583).

[59]  Yannis Manolopoulos,et al.  SkyGraph: an algorithm for important subgraph discovery in relational graphs , 2008, Data Mining and Knowledge Discovery.

[60]  Lawrence B. Holder,et al.  Graph-Based Data Mining , 2000, IEEE Intell. Syst..

[61]  A. John MINING GRAPH DATA , 2022 .

[62]  Arnaud Quirin,et al.  A quick MST-based algorithm to obtain Pathfinder networks (∞, n - 1) , 2008 .