Querying and ranking incomplete twigs in probabilistic XML

As the next generation language of the Internet, XML has been the de-facto standard of information exchange over the web. A core operation for XML query processing is to find all the occurrences of a twig pattern in an XML database. In addition, the study of probabilistic data has become an emerging topic for various applications on the Web. Therefore, researching the combination of XML twig pattern and probabilistic data is quite significant. In prior work of probabilistic XML, the answers of a given twig query are always complete. However, complete answers with low probabilities may be deemed irrelevant while incomplete answers with high probabilities are of great significance because incomplete answers may be the potential answers that interest the users. Different from complete evaluation, evaluating incomplete twigs in probabilistic XML introduces some new challenges. On one hand, incomplete queries do not only obtain complete matches, but also return answers that contain considerable incomplete matches. On the other hand, the processing of incomplete evaluation is more complicated. It is obvious that a ranking approach should be adopted along with evaluating incomplete answers. In this paper, we propose an efficient algorithm to handle the problem of querying incomplete twigs over the probabilistic XML database. We also present a novel algorithm for ranking the incomplete answers. The experimental results show that our proposed algorithms can improve the performance of querying and ranking incomplete twigs significantly.

[1]  H. V. Jagadish,et al.  ProTDB: Probabilistic Data in XML , 2002, VLDB.

[2]  Aristides Gionis,et al.  Automated Ranking of Database Query Results , 2003, CIDR.

[3]  Yehoshua Sagiv,et al.  Query efficiency in probabilistic XML models , 2008, SIGMOD Conference.

[4]  Serge Abiteboul,et al.  Querying and Updating Probabilistic Information in XML , 2006, EDBT.

[5]  Jeffrey Xu Yu,et al.  Independence of Containing Patterns Property and Its Application in Tree Pattern Query Rewriting Using Views , 2008, World Wide Web.

[6]  Yawen Li,et al.  Holistically Twig Matching in Probabilistic XML , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[7]  Werner Nutt,et al.  Querying Incomplete Information in Semistructured Data , 2002, J. Comput. Syst. Sci..

[8]  V. S. Subrahmanian,et al.  PXML: a probabilistic semistructured data model and algebra , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[9]  Serge Abiteboul,et al.  On the complexity of managing probabilistic XML data , 2007, PODS '07.

[10]  H. Mannila,et al.  Discovering all most specific sentences , 2003, TODS.

[11]  Jignesh M. Patel,et al.  Structural joins: a primitive for efficient XML query pattern matching , 2002, Proceedings 18th International Conference on Data Engineering.

[12]  Zongmin Ma,et al.  Answering approximate queries over autonomous web databases , 2009, WWW '09.

[13]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[14]  Serge Abiteboul,et al.  Representing and querying XML with incomplete information , 2001, PODS '01.

[15]  Tok Wang Ling,et al.  From Region Encoding To Extended Dewey: On Efficient Processing of XML Twig Pattern Matching , 2005, VLDB.

[16]  Jian Liu,et al.  Efficient processing of twig pattern matching in fuzzy XML , 2009, CIKM.

[17]  Jeffrey Xu Yu,et al.  Query ranking in probabilistic XML data , 2009, EDBT '09.

[18]  Christopher Ré,et al.  Efficient Top-k Query Evaluation on Probabilistic Data , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[19]  Divesh Srivastava,et al.  Holistic twig joins: optimal XML pattern matching , 2002, SIGMOD '02.

[20]  Sihem Amer-Yahia,et al.  Structure and Content Scoring for XML , 2005, VLDB.

[21]  Xiaoying Wu,et al.  Evaluation Techniques for Generalized Path Pattern Queries on XML Data , 2010, World Wide Web.

[22]  Klaus Turowski,et al.  Representing and processing fuzzy information - an XML-based approach , 2002, Knowl. Based Syst..

[23]  Yehoshua Sagiv,et al.  Matching Twigs in Probabilistic XML , 2007, VLDB.

[24]  Yehoshua Sagiv,et al.  Combining Incompleteness and Ranking in Tree Queries , 2007, ICDT.

[25]  Jian Pei,et al.  Efficiently Answering Probabilistic Threshold Top-k Queries on Uncertain Data , 2008, 2008 IEEE 24th International Conference on Data Engineering.