Algorithmics and applications of tree and graph searching

Modern search engines answer keyword-based queries extremely efficiently. The impressive speed is due to clever inverted index structures, caching, a domain-independent knowledge of strings, and thousands of machines. Several research efforts have attempted to generalize keyword search to keytree and keygraph searching, because trees and graphs have many applications in next-generation database systems. This paper surveys both algorithms and applications, giving some emphasis to our own work.

[1]  Laks V. S. Lakshmanan,et al.  Minimization of tree pattern queries , 2001, SIGMOD '01.

[2]  Alin Deutsch,et al.  A Query Language for XML , 1999, Comput. Networks.

[3]  William J. Christmas,et al.  Structural Matching in Computer Vision Using Probabilistic Relaxation , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[4]  Jennifer Widom,et al.  The Lorel query language for semistructured data , 1997, International Journal on Digital Libraries.

[5]  Kaizhong Zhang,et al.  Automated Discovery of Active Motifs in Three Dimensional Molecules , 1997, KDD.

[6]  Serge Abiteboul,et al.  Regular Path Queries with Constraints , 1999, J. Comput. Syst. Sci..

[7]  Peter Buneman,et al.  Path Constraints in Semistructured and Structured Databases. , 1998, PODS 1998.

[8]  Harry G. Barrow,et al.  Subgraph Isomorphism, Matching Relational Structures and Maximal Cliques , 1976, Inf. Process. Lett..

[9]  Serge Abiteboul,et al.  Querying Semi-Structured Data , 1997, Encyclopedia of Database Systems.

[10]  Thomas C. Henderson Discrete relaxation techniques , 1989 .

[11]  Radu Horaud,et al.  Stereo Correspondence Through Feature Grouping and Maximal Cliques , 1989, IEEE Trans. Pattern Anal. Mach. Intell..

[12]  Serge Abiteboul,et al.  Regular path queries with constraints , 1997, PODS '97.

[13]  Stanley B. Zdonik,et al.  The AQUA Data Model and Algebra , 1993, DBPL.

[14]  Wenfei Fan,et al.  Path constraints on semistructured and structured data , 1998, PODS '98.

[15]  Divesh Srivastava,et al.  Substring selectivity estimation , 1999, PODS '99.

[16]  Horst Bunke,et al.  Error Correcting Graph Matching: On the Influence of the Underlying Cost Function , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[17]  Gabriel Valiente,et al.  A graph distance metric combining maximum common subgraph and minimum common supergraph , 2001, Pattern Recognit. Lett..

[18]  King-Sun Fu,et al.  Error-Correcting Isomorphisms of Attributed Relational Graphs for Pattern Analysis , 1979, IEEE Transactions on Systems, Man, and Cybernetics.

[19]  Steven Gold,et al.  A Graduated Assignment Algorithm for Graph Matching , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[20]  Julian R. Ullmann,et al.  An Algorithm for Subgraph Isomorphism , 1976, J. ACM.

[21]  Nils J. Nilsson,et al.  Principles of Artificial Intelligence , 1980, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Giovanni Gallo,et al.  Best-Match Retrieval for Structured Images , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[23]  Dan Suciu,et al.  Index Structures for Path Expressions , 1999, ICDT.

[24]  Dan Suciu,et al.  Data on the Web: From Relations to Semistructured Data and XML , 1999 .

[25]  Divesh Srivastava,et al.  Counting twig matches in a tree , 2001, Proceedings 17th International Conference on Data Engineering.

[26]  Udi Manber,et al.  Fast text searching: allowing errors , 1992, CACM.

[27]  Horst Bunke,et al.  Inexact graph matching for structural pattern recognition , 1983, Pattern Recognit. Lett..

[28]  Jennifer Widom,et al.  Representing and querying changes in semistructured data , 1998, Proceedings 14th International Conference on Data Engineering.

[29]  Richard Cole,et al.  Tree pattern matching and subset matching in deterministic O(n log3 n)-time , 1999, SODA '99.

[30]  Edwin R. Hancock,et al.  A Bayesian compatibility model for graph matching , 1996, Pattern Recognit. Lett..

[31]  Salih O. Duffuaa,et al.  A Linear Programming Approach for the Weighted Graph Matching Problem , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[32]  D. Corneil,et al.  An Efficient Algorithm for Graph Isomorphism , 1970, JACM.

[33]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[34]  Victor Vianu,et al.  A Web Odyssey: from Codd to XML , 2001, PODS.

[35]  Aurelio López-López,et al.  Conceptual graph matching: a flexible algorithm and experiments , 1992, J. Exp. Theor. Artif. Intell..

[36]  Thomas Schwentick,et al.  Expressive and efficient pattern languages for tree-structured data (extended abstract) , 2000, PODS '00.

[37]  Ricardo A. Baeza-Yates,et al.  Algorithms for string searching , 1989, SIGF.

[38]  Dan Suciu,et al.  UnQL: a query language and algebra for semistructured data based on structural recursion , 2000, The VLDB Journal.

[39]  Mario Vento,et al.  An efficient algorithm for the inexact matching of ARG graphs using a contextual transformational model , 1996, Proceedings of 13th International Conference on Pattern Recognition.

[40]  Mihalis Yannakakis,et al.  Graph-theoretic methods in database theory , 1990, PODS.

[41]  Marc Gyssens,et al.  A graph-oriented object database model , 1990, IEEE Trans. Knowl. Data Eng..

[42]  Shinji Umeyama,et al.  An Eigendecomposition Approach to Weighted Graph Matching Problems , 1988, IEEE Trans. Pattern Anal. Mach. Intell..

[43]  S. Muthukrishnan,et al.  Selectively estimation for Boolean queries , 2000, PODS '00.

[44]  Gaston H. Gonnet,et al.  Fast text searching for regular expressions or automaton searching on tries , 1996, JACM.

[45]  Eugene W. Myers,et al.  Suffix arrays: a new method for on-line string searches , 1993, SODA '90.

[46]  Jignesh M. Patel,et al.  Structural joins: a primitive for efficient XML query pattern matching , 2002, Proceedings 18th International Conference on Data Engineering.

[47]  M. Dhome,et al.  Inexact matching using neural networks , 1994 .

[48]  Roy Goldman,et al.  DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases , 1997, VLDB.

[49]  Ricardo Baeza-Yates,et al.  Information Retrieval: Data Structures and Algorithms , 1992 .

[50]  Jeffrey Scott Vitter,et al.  Selectivity estimation in the presence of alphanumeric correlations , 1997, Proceedings 13th International Conference on Data Engineering.

[51]  Nicolás Marín,et al.  Review of Data on the Web: from relational to semistructured data and XML by Serge Abiteboul, Peter Buneman, and Dan Suciu. Morgan Kaufmann 1999. , 2003, SGMD.

[52]  Alberto O. Mendelzon,et al.  GraphLog: a visual formalism for real life recursion , 1990, PODS '90.

[53]  Frank Neven,et al.  Structured Document Transformations Based on XSL , 1999, DBPL.

[54]  Denilson Barbosa,et al.  ToX - the Toronto XML Engine , 2001, Workshop on Information Integration on the Web.

[55]  Hector Garcia-Molina,et al.  Meaningful change detection in structured data , 1997, SIGMOD '97.

[56]  Stanley B. Zdonik,et al.  Ordered Types in the AQUA Data Model , 1993, DBPL.

[57]  G. Levi A note on the derivation of maximal common subgraphs of two directed or undirected graphs , 1973 .

[58]  I. V. Ramakrishnan,et al.  Nonlinear pattern matching in trees , 1988, JACM.

[59]  Laks V. S. Lakshmanan,et al.  Querying network directories , 1999, SIGMOD '99.

[60]  Quanzhong Li,et al.  Indexing and Querying XML Data for Regular Path Expressions , 2001, VLDB.

[61]  Christoph M. Hoffmann,et al.  Pattern Matching in Trees , 1982, JACM.

[62]  Kaizhong Zhang,et al.  The approximate graph matching problem , 1994, Proceedings of the 12th IAPR International Conference on Pattern Recognition, Vol. 3 - Conference C: Signal Processing (Cat. No.94CH3440-5).

[63]  Ralf Hartmut Güting,et al.  GraphDB: Modeling and Querying Graphs in Databases , 1994, VLDB.

[64]  Kaizhong Zhang,et al.  On the Editing Distance Between Unordered Labeled Trees , 1992, Inf. Process. Lett..

[65]  Sihem Amer-Yahia,et al.  Tree Pattern Relaxation , 2002, EDBT.

[66]  Heikki Mannila,et al.  Retrieval from hierarchical texts by partial patterns , 1993, SIGIR.

[67]  Felix Naumann,et al.  Approximate tree embedding for querying XML data , 2000 .

[68]  Dan Suciu,et al.  Catching the boat with Strudel: experiences with a Web-site management system , 1998, SIGMOD '98.

[69]  Heikki Mannila,et al.  Query Primitives for Tree-Structured Data , 1994, CPM.

[70]  Kaizhong Zhang,et al.  Automated Discovery of Active Motifs in Multiple RNA Secondary Structures , 1996, KDD.

[71]  Dan Suciu,et al.  An overview of semistructured data , 1998, SIGA.

[72]  Pekka Kilpeläinen,et al.  Tree Matching Problems with Applications to Structured Text Databases , 2022 .

[73]  Pekka Kilpeläinen,et al.  Using sgrep for querying structured text files 1 , 1996 .

[74]  Peter Buneman,et al.  Semistructured data , 1997, PODS.

[75]  Lawrence B. Holder,et al.  Substructure Discovery Using Minimum Description Length and Background Knowledge , 1993, J. Artif. Intell. Res..

[76]  Jeffrey D. Ullman,et al.  Representative objects: concise representations of semistructured, hierarchical data , 1997, Proceedings 13th International Conference on Data Engineering.

[77]  Kaizhong Zhang,et al.  Approximate Tree Matching in the Presence of Variable Length Don't Cares , 1994, J. Algorithms.

[78]  Michael J. Franklin,et al.  Efficient Filtering of XML Documents for Selective Dissemination of Information , 2000, VLDB.

[79]  Divesh Srivastava,et al.  Multi-Dimensional Substring Selectivity Estimation , 1999, VLDB.

[80]  Kaizhong Zhang,et al.  Structural matching and discovery in document databases , 1997, SIGMOD '97.

[81]  King-Sun Fu,et al.  A distance measure between attributed relational graphs for pattern recognition , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[82]  Kaizhong Zhang,et al.  Finding approximate patterns in undirected acyclic graphs , 2002, Pattern Recognit..

[83]  Dan Suciu,et al.  A query language and optimization techniques for unstructured data , 1996, SIGMOD '96.

[84]  Josep Lladós,et al.  Symbol Recognition by Error-Tolerant Subgraph Matching between Region Adjacency Graphs , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[85]  Hannu Toivonen,et al.  Finding Frequent Substructures in Chemical Compounds , 1998, KDD.

[86]  Alberto O. Mendelzon,et al.  Querying the World Wide Web , 1997, International Journal on Digital Libraries.

[87]  Gaston H. Gonnet,et al.  Mind Your Grammar: a New Approach to Modelling Text , 1987, VLDB.

[88]  Alberto O. Mendelzon,et al.  Finding Regular Simple Paths in Graph Databases , 1989, SIAM J. Comput..

[89]  Elke A. Rundensteiner,et al.  Automating the transformation of XML documents , 2001, WIDM '01.

[90]  Jennifer Widom,et al.  Query Optimization for XML , 1999, VLDB.

[91]  Jennifer Widom,et al.  Change detection in hierarchically structured information , 1996, SIGMOD '96.

[92]  Heikki Mannila,et al.  Ordered and Unordered Tree Inclusion , 1995, SIAM J. Comput..

[93]  Edwin R. Hancock,et al.  Bayesian graph edit distance , 1999, Proceedings 10th International Conference on Image Analysis and Processing.

[94]  Andrew K. C. Wong,et al.  Entropy and Distance of Random Graphs with Application to Structural Pattern Recognition , 1985, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[95]  Yehoshua Sagiv,et al.  Flexible queries over semistructured data , 2001, PODS '01.

[96]  P. Krishnan,et al.  Estimating alphanumeric selectivity in the presence of wildcards , 1996, SIGMOD '96.

[97]  Lawrence B. Holder,et al.  Cover story: structural Web search using a graph-based discovery system , 2001, INTL.

[98]  Kaizhong Zhang,et al.  A System for Approximate Tree Matching , 1994, IEEE Trans. Knowl. Data Eng..

[99]  Lawrence B. Holder,et al.  An Emprirical Study of Domain Knowledge and Its Benefits to Substructure Discovery , 1997, IEEE Trans. Knowl. Data Eng..

[100]  Stanley B. Zdonik,et al.  The AQUA approach to querying lists and trees in object-oriented databases , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[101]  Dan Suciu,et al.  Containment and equivalence for an XPath fragment , 2002, PODS.

[102]  Kaizhong Zhang,et al.  ATreeGrep: approximate searching in unordered trees , 2002, Proceedings 14th International Conference on Scientific and Statistical Database Management.

[103]  King-Sun Fu,et al.  A graph distance measure for image analysis , 1984, IEEE Transactions on Systems, Man, and Cybernetics.