论文信息 - Keyword searches in data-centric XML documents using tree partitioning

Keyword searches in data-centric XML documents using tree partitioning

Abstract This paper presents an effective keyword search method for data-centric extensive markup language (XML) documents. The method divides an XML document into compact connected integral subtrees, called self-integral trees (SI-Trees), to capture the structural information in the XML document. The SI-Trees are generated based on a schema guide. Meaningful self-integral trees (MSI-Trees) are identified, which contain all or some of the input keywords for the keyword search in the XML documents. Indexing is used to accelerate the retrieval of MSI-Trees related to the input keywords. The MSI-Trees are ranked to identify the top-k results with the highest ranks. Extensive tests demonstrate that this method costs 10–100 ms to answer a keyword query, and outperforms existing approaches by 1–2 orders of magnitude.

Guoliang Li | Lizhu Zhou | Jianhua Feng

[1] Ronald Fagin,et al. Combining Fuzzy Information from Multiple Systems , 1999, J. Comput. Syst. Sci..

[2] H. V. Jagadish,et al. NaLIX: an interactive natural language interface for querying XML , 2005, SIGMOD '05.

[3] Yin Yang,et al. Keyword search on relational data streams , 2007, SIGMOD '07.

[4] Lin Guo,et al. Topology Search over Biological Databases , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[5] Guoliang Li,et al. Efficient Keyword Search over Data-Centric XML Documents , 2007, APWeb/WAIM.

[6] Feng Lin,et al. Progressive Ranking for Efficient Keyword Search over Relational Databases , 2008, BNCOD.

[7] Sujeet Pradhan,et al. An algebraic query model for effective and efficient retrieval of XML fragments , 2006, VLDB.

[8] Sihem Amer-Yahia,et al. Expressiveness and Performance of Full-Text Search Languages , 2006, EDBT.

[9] Clement T. Yu,et al. Effective keyword search in relational databases , 2006, SIGMOD Conference.

[10] Vagelis Hristidis,et al. DISCOVER: Keyword Search in Relational Databases , 2002, VLDB.

[11] Shan Wang,et al. Finding Top-k Min-Cost Connected Trees in Databases , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[12] Vagelis Hristidis,et al. Keyword proximity search on XML graphs , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[13] Uzi Vishkin,et al. On Finding Lowest Common Ancestors: Simplification and Parallelization , 1988, AWOC.

[14] Feng Shao,et al. XRANK: ranked keyword search over XML documents , 2003, SIGMOD '03.

[15] Divesh Srivastava,et al. Keyword proximity search in XML trees , 2006 .

[16] Beng Chin Ooi,et al. EASE: an effective 3-in-1 keyword search method for unstructured, semi-structured and structured data , 2008, SIGMOD Conference.

[17] H. V. Jagadish,et al. Constructing a Generic Natural Language Interface for an XML Database , 2006, EDBT.

[18] S. Sudarshan,et al. Bidirectional Expansion For Keyword Search on Graph Databases , 2005, VLDB.

[19] Robert E. Tarjan,et al. Fast Algorithms for Finding Nearest Common Ancestors , 1984, SIAM J. Comput..

[20] Surajit Chaudhuri,et al. DBXplorer: a system for keyword-based search over relational databases , 2002, Proceedings 18th International Conference on Data Engineering.

[21] Yannis Papakonstantinou,et al. Efficient keyword search for smallest LCAs in XML databases , 2005, SIGMOD '05.

[22] Luis Gravano,et al. Efficient Keyword Search Across Heterogeneous Relational Databases , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[23] Sihem Amer-Yahia,et al. Structure and Content Scoring for XML , 2005, VLDB.

[24] S. Sudarshan,et al. Keyword searching and browsing in databases using BANKS , 2002, Proceedings 18th International Conference on Data Engineering.

[25] Xuemin Lin,et al. SPARK2: Top-k Keyword Query in Relational Databases , 2007, IEEE Transactions on Knowledge and Data Engineering.

[26] Sihem Amer-Yahia,et al. XQuery Full-Text extensions explained , 2006, IBM Syst. J..

[27] Cong Yu,et al. Schema-Free XQuery , 2004, VLDB.

[28] Gerhard Weikum,et al. Probabilistic Ranking of Database Query Results , 2004, VLDB.

[29] Bei Yu,et al. Race: finding and ranking compact connected trees for keyword proximity search over xml documents , 2008, WWW.

[30] Luis Gravano,et al. Efficient IR-Style Keyword Search over Relational Databases , 2003, VLDB.

[31] Philip S. Yu,et al. BLINKS: ranked keyword searches on graphs , 2007, SIGMOD '07.

[32] Sihem Amer-Yahia,et al. Flexible and efficient XML search with complex full-text predicates , 2006, SIGMOD Conference.

[33] Ronald Fagin,et al. Fuzzy queries in multimedia database systems , 1998, PODS '98.

[34] Burton H. Bloom,et al. Space/time trade-offs in hash coding with allowable errors , 1970, CACM.

[35] Jianyong Wang,et al. Sailer: an effective search engine for unified retrieval of heterogeneous xml and web documents , 2008, WWW.

[36] Krishna Bharat,et al. Supporting cooperative and personal surfing with a desktop assistant , 1997, UIST '97.

[37] Yehoshua Sagiv,et al. Interconnection semantics for keyword search in XML , 2005, CIKM '05.

[38] Yi Chen,et al. Identifying meaningful return information for XML keyword search , 2007, SIGMOD '07.

[39] Guoliang Li,et al. Retune: Retrieving and Materializing Tuple Units for Effective Keyword Search over Relational Databases , 2008, ER.

[40] Yehoshua Sagiv,et al. XSEarch: A Semantic Search Engine for XML , 2003, VLDB.

[41] Yehoshua Sagiv,et al. Finding and approximating top-k answers in keyword proximity search , 2006, PODS '06.

[42] Sihem Amer-Yahia,et al. Adaptive processing of top-k queries in XML , 2005, 21st International Conference on Data Engineering (ICDE'05).