Graph-Based Tools for Data Mining and Machine Learning

Many powerful methods for intelligent data analysis have become available in the fields of machine learning and data mining. However, almost all of these methods are based on the assumption that the objects under consideration are represented in terms of feature vectors, or collections of attribute values. In the present paper we argue that symbolic representations, such as strings, trees or graphs, have a representational power that is significantly higher than the representational power of feature vectors. On the other hand, operations on these data structure that are typically needed in data mining and machine learning are more involved than their counterparts on feature vectors. However, recent progress in graph matching and related areas has led to many new practical methods that seem to be very promising for a wide range of applications.

[1]  Horst Bunke,et al.  Theoretical Analysis and Experimental Comparison of Graph Matching Algorithms for Database Filtering , 2003, GbRPR.

[2]  Horst Bunke,et al.  A graph distance metric based on the maximal common subgraph , 1998, Pattern Recognit. Lett..

[3]  Miro Kraetzl,et al.  Graph distances using graph union , 2001, Pattern Recognit. Lett..

[4]  Simon Parsons,et al.  Principles of Data Mining by David J. Hand, Heikki Mannila and Padhraic Smyth, MIT Press, 546 pp., £34.50, ISBN 0-262-08290-X , 2004, The Knowledge Engineering Review.

[5]  Mehmed Kantardzic,et al.  Data Mining: Concepts, Models, Methods, and Algorithms , 2002 .

[6]  J. J. McGregor,et al.  Backtrack search algorithms and the maximal common subgraph problem , 1982, Softw. Pract. Exp..

[7]  Abraham Kandel,et al.  Graph Representations for Web Document Clustering , 2003, IbPRIA.

[8]  Abraham Kandel,et al.  Data Mining in Time Series Database , 2004 .

[9]  Gabriel Valiente,et al.  A graph distance metric combining maximum common subgraph and minimum common supergraph , 2001, Pattern Recognit. Lett..

[10]  Michael J. Fischer,et al.  The String-to-String Correction Problem , 1974, JACM.

[11]  Horst Bunke,et al.  On Median Graphs: Properties, Algorithms, and Applications , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[12]  Abraham Kandel,et al.  Mean and maximum common subgraph of two graphs , 2000, Pattern Recognit. Lett..

[13]  Horst Bunke,et al.  A New Algorithm for Error-Tolerant Subgraph Isomorphism Detection , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[14]  Apostolos Antonacopoulos,et al.  Web Document Analysis: Challenges and Opportunities , 2003 .

[15]  Horst Bunke,et al.  Weighted Mean of a Pair of Graphs , 2001, Computing.

[16]  Edwin R. Hancock,et al.  Spectral Feature Vectors for Graph Clustering , 2002, SSPR/SPR.

[17]  Horst Bunke,et al.  Detection of Abnormal Change in a Time Series of Graphs , 2002, J. Interconnect. Networks.

[18]  Abraham Kandel,et al.  Classification of Web documents using a graph model , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[19]  H. Bunke,et al.  CLASSIFICATION AND DETECTION OF ABNORMAL EVENTS IN TIME SERIES OF GRAPHS , 2004 .

[20]  Horst Bunke,et al.  Graph Edit Distance with Node Splitting and Merging, and Its Application to Diatom Idenfication , 2003, GbRPR.

[21]  Peter N. Yianilos,et al.  Learning String-Edit Distance , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[22]  Mario Vento,et al.  Thirty Years Of Graph Matching In Pattern Recognition , 2004, Int. J. Pattern Recognit. Artif. Intell..

[23]  King-Sun Fu,et al.  Syntactic Pattern Recognition And Applications , 1968 .

[24]  Horst Bunke,et al.  Syntactic and structural pattern recognition : theory and applications , 1990 .

[25]  Horst Bunke,et al.  Error Correcting Graph Matching: On the Influence of the Underlying Cost Function , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[26]  Horst Bunke,et al.  On Graphs with Unique Node Labels , 2003, GbRPR.

[27]  Marc Parizeau,et al.  Optimizing the cost matrix for approximate string matching using genetic algorithms , 1998, Pattern Recognit..

[28]  Abraham Kandel,et al.  Classification Of Web Documents Using Graph Matching , 2004, Int. J. Pattern Recognit. Artif. Intell..

[29]  Horst Bunke,et al.  Self-Organizing Graph Graph Edit Distance , 2003, GbRPR.

[30]  Dov Dori,et al.  Advances in Pattern Recognition , 1998, Lecture Notes in Computer Science.

[31]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[32]  Julian R. Ullmann,et al.  An Algorithm for Subgraph Isomorphism , 1976, J. ACM.

[33]  Horst Bunke,et al.  Validation indices for graph clustering , 2003, Pattern Recognit. Lett..

[34]  Teuvo Kohonen,et al.  Self-Organizing Maps , 2010 .

[35]  Kyu Ho Park,et al.  Incremental clustering of attributed graphs , 1993, IEEE Trans. Syst. Man Cybern..

[36]  G. Levi A note on the derivation of maximal common subgraphs of two directed or undirected graphs , 1973 .

[37]  John E. Hopcroft,et al.  Linear time algorithm for isomorphism of planar graphs (Preliminary Report) , 1974, STOC '74.

[38]  Charles T. Zahn,et al.  Graph-Theoretical Methods for Detecting and Describing Gestalt Clusters , 1971, IEEE Transactions on Computers.

[39]  M. Inés Torres,et al.  Pattern recognition and applications , 2000 .

[40]  Horst Bunke,et al.  Marked Subgraph Isomorphism of Ordered Graphs , 1998, SSPR/SPR.

[41]  Gerard Salton,et al.  Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer , 1989 .

[42]  Horst Bunke,et al.  Self-organizing map for clustering in the graph domain , 2002, Pattern Recognit. Lett..

[43]  Alberto Sanfeliu,et al.  Synthesis of Function-Described Graphs and Clustering of Attributed Graphs , 2002, Int. J. Pattern Recognit. Artif. Intell..