Automata Approach to XML Data Indexing

The internal structure of XML documents can be viewed as a tree. Trees are among the fundamental and well-studied data structures in computer science. They express a hierarchical structure and are widely used in many applications. This paper focuses on the problem of processing tree data structures; particularly, it studies the XML index problem. Although there exist many state-of-the-art methods, the XML index problem still belongs to the active research areas. However, existing methods usually lack clear references to a systematic approach to the standard theory of formal languages and automata. Therefore, we present some new methods solving the XML index problem using the automata theory. These methods are simple and allow one to efficiently process a small subset of XPath. Thus, having an XML data structure, our methods can be used efficiently as auxiliary data structures that enable answering a particular set of queries, e.g., XPath queries using any combination of the child and descendant-or-self axes. Given an XML tree model with n nodes, the searching phase uses the index, reads an input query of size m, finds the answer in time O ( m ) and does not depend on the size of the original XML document.

[1]  Frank Neven,et al.  Automata theory for XML researchers , 2002, SGMD.

[2]  Dan Suciu,et al.  Index Structures for Path Expressions , 1999, ICDT.

[3]  Dan Suciu,et al.  Processing XML streams with deterministic automata and stream indexes , 2004, TODS.

[4]  Farshad Fotouhi,et al.  MTree: an XML XPath graph index , 2006, SAC.

[5]  Masatoshi Yoshikawa,et al.  An XML indexing structure with relative region coordinate , 2001, Proceedings 17th International Conference on Data Engineering.

[6]  Ricardo A. Baeza-Yates,et al.  Searching Subsequences , 1991, Theor. Comput. Sci..

[7]  Jan Janousek,et al.  Indexing XML Documents Using Tree Paths Automaton , 2017, SLATE.

[8]  Frank Neven,et al.  Automata, Logic, and XML , 2002, CSL.

[9]  Chun Zhang,et al.  Storing and querying ordered XML using a relational database system , 2002, SIGMOD '02.

[10]  Borivoj Melichar,et al.  Directed acyclic subsequence graph - Overview , 2003, J. Discrete Algorithms.

[11]  Maxime Crochemore,et al.  On the Size of DASG for Multiple Texts , 2002, SPIRE.

[12]  Ayumi Shinohara,et al.  Online construction of subsequence automata for multiple texts , 2000, Proceedings Seventh International Symposium on String Processing and Information Retrieval. SPIRE 2000.

[13]  Hoang Do Thanh Tung,et al.  An Improved Indexing Method for Xpath Queries , 2016 .

[14]  Victor Vianu,et al.  Validating streaming XML documents , 2002, PODS.

[15]  Thomas Schwentick,et al.  Automata for XML - A survey , 2007, J. Comput. Syst. Sci..

[16]  Kam-Fai Wong,et al.  Hierarchical Indexing Approach to Support XPath Queries , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[17]  Steven J. DeRose,et al.  XML Path Language (XPath) Version 1.0 , 1999 .

[18]  Dan Suciu,et al.  Query Caching and View Selection for XML Databases , 2005, VLDB.

[19]  Xiaofeng Meng,et al.  On the sequencing of tree structures for XML indexing , 2005, 21st International Conference on Data Engineering (ICDE'05).

[20]  Wesley W. Chu,et al.  Ctree: a compact tree for indexing XML data , 2004, WIDM '04.

[21]  Murali Mani,et al.  Taxonomy of XML schema languages using formal language theory , 2005, TOIT.

[22]  Philip S. Yu,et al.  ViST: a dynamic index method for querying XML data by tree structures , 2003, SIGMOD '03.

[23]  Maxime Crochemore,et al.  Algorithms on strings , 2007 .

[24]  Quanzhong Li,et al.  Indexing and Querying XML Data for Regular Path Expressions , 2001, VLDB.

[25]  Lee Chien-Sing,et al.  Node Labeling Schemes in XML Query Optimization: A Survey and Trends , 2009 .

[26]  Weiwei Sun,et al.  An automaton-based index scheme supporting twig queries for on-demand XML data broadcast , 2015, J. Parallel Distributed Comput..

[27]  Steven J. DeRose,et al.  Xml linking language (xlink), version 1. 0 , 2000, WWW 2000.

[28]  Bo Zhang,et al.  AB-Index: An Efficient Adaptive Index for Branching XML Queries , 2007, DASFAA.

[29]  Kyuseok Shim,et al.  APEX: an adaptive path index for XML data , 2002, SIGMOD '02.

[30]  Derick Wood,et al.  Regular Tree Languages Over Non-Ranked Alphabets , 1998 .

[31]  Steven J. DeRose,et al.  Xml pointer language (xpointer) version 1 , 2001 .

[32]  Yanlei Diao,et al.  YFilter: efficient and scalable filtering of XML documents , 2002, Proceedings 18th International Conference on Data Engineering.

[33]  Wenfei Fan,et al.  Rewriting Regular XPath Queries on XML Views , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[34]  Bongki Moon,et al.  PRIX: indexing and querying XML using prufer sequences , 2004, Proceedings. 20th International Conference on Data Engineering.

[35]  Rajeev Rastogi,et al.  RE-Tree: An Efficient Index Structure for Regular Expressions , 2002, VLDB.

[36]  Jan Janousek,et al.  Tree String Path Subsequences Automaton and Its Use for Indexing XML Documents , 2015, SLATE.

[37]  Roy Goldman,et al.  DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases , 1997, VLDB.

[38]  Dana S. Scott,et al.  Finite Automata and Their Decision Problems , 1959, IBM J. Res. Dev..

[39]  Jeffrey F. Naughton,et al.  Covering indexes for branching path queries , 2002, SIGMOD '02.