No Tag, a Little Nesting, and Great XML Keyword Search

Keyword search from Informational Retrieval (IR) can be seen as one most convenient processing mode catering for common users to obtain interesting information. As XML data becomes more and more widespread, the trend of adapting keyword search on XML data also becomes more and more active. In this paper, we first try nesting mechanism for XML keyword search, which just uses a little nesting skill. This attempt has several benefits. For example, it is convenient for common users, because they need not to know any organization knowledge of the target XML data. Secondly, the nesting pattern can be easily transformed into structural hints, which has same mechanism as what XML data model does. Finally, since there is no need of label information, we can retrieve XML fragments from different schemas. Besides, this paper also proposes a new similarity measuring method for retrieved XML fragments which can be from different schemas. Its kernel is KCAM (Keyword Common Ancestor Matrix) structure, which stores the level information of SLCA (Smallest Lowest Common Ancestor) node between two keywords. By mapping XML fragments into KCAMs, the structural similarity can be computed using matrix distance. KCAM distance can go well with the nesting keyword method.

[1]  Torsten Schlieder,et al.  Querying and ranking XML documents , 2002, J. Assoc. Inf. Sci. Technol..

[2]  Sihem Amer-Yahia,et al.  Structure and Content Scoring for XML , 2005, VLDB.

[3]  Kaizhong Zhang,et al.  On the Editing Distance Between Unordered Labeled Trees , 1992, Inf. Process. Lett..

[4]  Menzo Windhouwer,et al.  Querying XML documents made easy: nearest concept queries , 2001, Proceedings 17th International Conference on Data Engineering.

[5]  David Carmel,et al.  Searching XML documents via XML fragments , 2003, SIGIR.

[6]  Sihem Amer-Yahia,et al.  Texquery: a full-text search extension to xquery , 2004, WWW '04.

[7]  Kaizhong Zhang,et al.  Approximate tree pattern matching , 1997 .

[8]  Sachindra Joshi,et al.  A bag of paths model for measuring structural similarity in Web documents , 2003, KDD '03.

[9]  Yannis Papakonstantinou,et al.  Efficient keyword search for smallest LCAs in XML databases , 2005, SIGMOD '05.

[10]  François Bry,et al.  Content and structure in indexing and ranking XML , 2004, WebDB '04.

[11]  Anthony K. H. Tung,et al.  Similarity evaluation on tree-structured data , 2005, SIGMOD '05.

[12]  Cong Yu,et al.  Querying structured text in an XML database , 2003, SIGMOD '03.

[13]  Philip Bille,et al.  A survey on tree edit distance and related problems , 2005, Theor. Comput. Sci..

[14]  Michael Gertz,et al.  XQuery/IR: Integrating XML Document and Data Retrieval , 2002, WebDB.

[15]  Laks V. S. Lakshmanan,et al.  FleXPath: flexible structure and full-text querying for XML , 2004, SIGMOD '04.

[16]  Chun Zhang,et al.  Storing and querying ordered XML using a relational database system , 2002, SIGMOD '02.

[17]  Ioana Manolescu,et al.  Integrating Keyword Search into XML Query Processing , 2000, BDA.

[18]  Torsten Schlieder,et al.  Result Ranking for Structured Queries against XML Documents , 2000, DELOS.

[19]  Nicholas Kushmerick,et al.  Similarity-based Queries for XML Databases Using ELIXIR , 2001, WWW Posters.

[20]  Cong Yu,et al.  Integration of IR into an XML Database , 2002, INEX Workshop.

[21]  Feng Shao,et al.  XRANK: ranked keyword search over XML documents , 2003, SIGMOD '03.

[22]  Evangelos Kotsakis,et al.  Structured information retrieval in XML documents , 2002, SAC '02.

[23]  Yehoshua Sagiv,et al.  XSEarch: A Semantic Search Engine for XML , 2003, VLDB.

[24]  Sihem Amer-Yahia,et al.  GalaTex: a conformant implementation of the XQuery full-text language , 2005, WWW '05.

[25]  Hans-Peter Kriegel,et al.  Efficient Similarity Search for Hierarchical Data in Large Databases , 2004, EDBT.

[26]  Jayavel Shanmugasundaram,et al.  Context-Sensitive Keyword Search and Ranking for XML , 2005, WebDB.

[27]  Gerard Salton,et al.  Automatic Information Organization And Retrieval , 1968 .

[28]  Norbert Fuhr,et al.  XIRQL: a query language for information retrieval in XML documents , 2001, SIGIR '01.

[29]  Armin B. Cremers,et al.  Searching and browsing collections of structural information , 2000, Proceedings IEEE Advances in Digital Libraries 2000.

[30]  Sudipto Guha,et al.  Approximate XML joins , 2002, SIGMOD '02.

[31]  Michael H. Böhlen,et al.  Approximate Matching of Hierarchical Data Using pq-Grams , 2005, VLDB.

[32]  Shlomo Geva,et al.  NLPX - An XML-IR System with a Natural Language Interface , 2004, ADCS.

[33]  Gerhard Weikum,et al.  The Index-Based XXL Search Engine for Querying XML Data with Relevance Ranking , 2002, EDBT.