Element Retrieval Using Namespace Based on Keyword Search over XML Documents

Querying over XML elements using keyword search is steadily gaining popularity. The traditional similarity measure is widely employed in order to effectively retrieve various XML documents. A number of authors have already proposed different similarity-measure methods that take advantage of the structure and content of XML documents. However, they do not consider the similarity between latent semantic information of element texts and that of keywords in a query. Although many algorithms on XML element search are available, some of them have the high computational complexity due to searching for a huge number of elements. In this paper, we propose a new algorithm that makes use of the se-mantic similarity between elements instead of between entire XML documents, considering not only the structure and content of an XML document, but also semantic information of namespaces in elements. We compare our algorithm with the three other algorithms by testing on real datasets. The experiments have demonstrated that our proposed method is able to improve the query accuracy, as well as to reduce the running time.

[1]  Mohand Boughanem,et al.  XML retrieval: what about using contextual relevance? , 2006, SAC '06.

[2]  Hyunbo Cho,et al.  A novel method for measuring semantic similarity for XML schema matching , 2008, Expert Syst. Appl..

[3]  Su-Cheng Haw,et al.  TwigINLAB: A Decomposition-Matching-Merging Approach To Improving XML Query Processing , 2008 .

[4]  Yehoshua Sagiv,et al.  Using Language Models and the HITS Algorithm for XML Retrieval , 2006, INEX.

[5]  Deok-Hwan Kim,et al.  Similarity Measurement of XML Documents Based on Structure and Contents , 2007, International Conference on Computational Science.

[6]  Mariano P. Consens,et al.  Structural Relevance in XML Retrieval Evaluation , 1989 .

[7]  François Bry,et al.  Content and structure in indexing and ranking XML , 2004, WebDB '04.

[8]  Djoerd Hiemstra,et al.  TIJAH Scratches INEX 2005: Vague Element Selection, Image Search, Overlap, and Relevance Feedback , 2005, INEX.

[9]  Yong Yu,et al.  Optimizing web search using social annotations , 2007, WWW '07.

[10]  Gabriella Kazai,et al.  Evaluating the effectiveness of content-oriented XML retrieval , 2003 .

[11]  Sihem Amer-Yahia,et al.  Structure and Content Scoring for XML , 2005, VLDB.

[12]  Gong Ling,et al.  An improved TF-IDF approach for text classification , 2005 .

[13]  Cong Yu,et al.  Querying structured text in an XML database , 2003, SIGMOD '03.

[14]  龚玲,et al.  An improved TF-IDF approach for text classification , 2005 .

[15]  Fan Yang,et al.  Efficient keyword search over virtual XML views , 2008, The VLDB Journal.

[16]  Ju-Hong Lee,et al.  Semantic Structural Similarity for Clustering XML Documents , 2008, 2008 International Conference on Convergence and Hybrid Information Technology.

[17]  Shlomo Geva,et al.  GPX - Gardens Point XML IR at INEX 2006 , 2006, INEX.

[18]  Scott Boag,et al.  XQuery 1.0 : An XML Query Language , 2007 .

[19]  Jayavel Shanmugasundaram,et al.  Context-Sensitive Keyword Search and Ranking for XML , 2005, WebDB.

[20]  Christian Mathis,et al.  Node labeling schemes for dynamic XML documents reconsidered , 2007, Data Knowl. Eng..

[21]  Jaap Kamps,et al.  The Effect of Structured Queries and Selective Indexing on XML Retrieval , 2005, INEX.

[22]  Feng Shao,et al.  XRANK: ranked keyword search over XML documents , 2003, SIGMOD '03.

[23]  Lihong Wang,et al.  Indexing Temporal XML Using Semantic-Tree Index , 2008, 2008 Third International Conference on Pervasive Computing and Applications.

[24]  Nivio Ziviani,et al.  A Universal Model for XML Information Retrieval , 2004, INEX.

[25]  James A. Thom,et al.  HiXEval: Highlighting XML Retrieval Evaluation , 2005, INEX.