Tree pattern matching in heterogeneous fuzzy XML databases

Dealing with heterogeneous data underlying fuzzy XML databases is challenging for any task of document management and knowledge discovery, since the structural heterogeneity and uncertainty of the large number of XML data sources make it difficult to effectively answer the structured query, especially the tree-pattern query. To address this issue, we propose a novel framework for managing fuzzy XML queries in a heterogeneous environment in this paper. In particular, we devise a holistic algorithm for matching tree-patterns over heterogeneous fuzzy XML data. Our approach adopts a compact stack technique and generates the matches by one scan on the relevant data associated with the tree-pattern, which eliminates re-scanning unnecessary portions of XML documents and redundant intermediate results. Finally, a comprehensive experimental evaluation conducted on real and synthetic data sets is carried out to show the significance of our approach as a solution for querying heterogeneous data in fuzzy XML documents.

[1]  L. Zadeh Fuzzy sets as a basis for a theory of possibility , 1999 .

[2]  Timos K. Sellis,et al.  Heuristic containment check of partial tree-pattern queries in the presence of index graphs , 2006, CIKM '06.

[3]  Marianne Winslett,et al.  Using structural information in XML keyword search effectively , 2011, TODS.

[4]  Jian Liu,et al.  Answering ordered tree pattern queries over fuzzy XML data , 2014, Knowledge and Information Systems.

[5]  Sergio Greco,et al.  Semantic clustering of XML documents , 2010, TOIS.

[6]  Serge Abiteboul,et al.  Representing and querying XML with incomplete information , 2006, TODS.

[7]  Yehoshua Sagiv,et al.  Flexible queries over semistructured data , 2001, PODS '01.

[8]  Maurice van Keulen,et al.  Qualitative effects of knowledge rules and user feedback in probabilistic data integration , 2009, The VLDB Journal.

[9]  Klaus Turowski,et al.  Representing and processing fuzzy information - an XML-based approach , 2002, Knowl. Based Syst..

[10]  Jian Liu,et al.  Dynamic labeling scheme for XML updates , 2016, Knowl. Based Syst..

[11]  Jian Liu,et al.  Efficient processing of twig pattern matching in fuzzy XML , 2009, CIKM.

[12]  Yehoshua Sagiv,et al.  Matching Twigs in Probabilistic XML , 2007, VLDB.

[13]  Laks V. S. Lakshmanan,et al.  FleXPath: flexible structure and full-text querying for XML , 2004, SIGMOD '04.

[14]  Beng Chin Ooi,et al.  An effective 3-in-1 keyword search method over heterogeneous data sources , 2011, Inf. Syst..

[15]  V. S. Subrahmanian,et al.  PXML: a probabilistic semistructured data model and algebra , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[16]  Serge Abiteboul,et al.  On the complexity of managing probabilistic XML data , 2007, PODS '07.

[17]  Yawen Li,et al.  Holistically Twig Matching in Probabilistic XML , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[18]  Divesh Srivastava,et al.  Holistic twig joins: optimal XML pattern matching , 2002, SIGMOD '02.

[19]  Su-Cheng Haw,et al.  Data storage practices and query processing in XML databases: A survey , 2011, Knowl. Based Syst..

[20]  Maurice van Keulen,et al.  A probabilistic XML approach to data integration , 2005, 21st International Conference on Data Engineering (ICDE'05).

[21]  Jian Liu,et al.  Dynamically querying possibilistic XML data , 2014, Inf. Sci..

[22]  Adnan Yazici,et al.  IFOOD: An Intelligent Fuzzy Object-Oriented Database Architecture , 2003, IEEE Trans. Knowl. Data Eng..

[23]  George J. Klir,et al.  Fuzzy sets, uncertainty and information , 1988 .

[24]  Sriram Padmanabhan,et al.  Query translation scheme for heterogeneous XML data sources , 2005, WIDM '05.

[25]  Yehoshua Sagiv,et al.  Query efficiency in probabilistic XML models , 2008, SIGMOD Conference.

[26]  Serge Abiteboul,et al.  Querying and Updating Probabilistic Information in XML , 2006, EDBT.

[27]  H. V. Jagadish,et al.  ProTDB: Probabilistic Data in XML , 2002, VLDB.

[28]  Jian Liu,et al.  Efficient processing of twig query with compound predicates in fuzzy XML , 2013, Fuzzy Sets Syst..

[29]  Dominique Laurent,et al.  Prefix based numbering schemes for XML: techniques, applications and performances , 2008, Proc. VLDB Endow..

[30]  Jian Liu,et al.  Matching twigs in fuzzy XML , 2011, Inf. Sci..

[31]  Didier Dubois,et al.  Possibility Theory and Its Applications: Where Do We Stand? , 2015, Handbook of Computational Intelligence.

[32]  Siu-Ming Yiu,et al.  An efficient and scalable algorithm for clustering XML documents by structure , 2004, IEEE Transactions on Knowledge and Data Engineering.

[33]  Jignesh M. Patel,et al.  Structural joins: a primitive for efficient XML query pattern matching , 2002, Proceedings 18th International Conference on Data Engineering.

[34]  Xuemin Lin,et al.  Keyword search on structured and semi-structured data , 2009, SIGMOD Conference.

[35]  Pedro M. Domingos,et al.  Reconciling schemas of disparate data sources: a machine-learning approach , 2001, SIGMOD '01.

[36]  Yehoshua Sagiv,et al.  Combining Incompleteness and Ranking in Tree Queries , 2007, ICDT.

[37]  Hongjun Lu,et al.  Holistic Twig Joins on Indexed XML Documents , 2003, VLDB.

[38]  Su-Cheng Haw,et al.  s-XML: An efficient mapping scheme to bridge XML and relational database , 2012, Knowl. Based Syst..