A multi-ranker model for adaptive XML searching

The evolution of computing technology suggests that it has become more feasible to offer access to Web information in a ubiquitous way, through various kinds of interaction devices such as PCs, laptops, palmtops, and so on. As XML has become a de-facto standard for exchanging Web data, an interesting and practical research problem is the development of models and techniques to satisfy various needs and preferences in searching XML data. In this paper, we employ a list of simple XML tagged keywords as a vehicle for searching XML fragments in a collection of XML documents. In order to deal with the diversified nature of XML documents as well as user preferences, we propose a novel multi-ranker model (MRM), which is able to abstract a spectrum of important XML properties and adapt the features to different XML search needs. The MRM is composed of three ranking levels. The lowest level consists of two categories of similarity and granularity features. At the intermediate level, we define four tailored XML rankers (XRs), which consist of different lower level features and have different strengths in searching XML fragments. The XRs are trained via a learning mechanism called the Ranking Support Vector Machine in a voting Spy Naïve Bayes framework (RSSF). The RSSF takes as input a set of labeled fragments and feature vectors and generates as output Adaptive Rankers (ARs) in the learning process. The ARs are defined over the XRs and generated at the top level of the MRM. We show empirically that the RSSF is able to improve the MRM significantly in the learning process that needs only a small set of training XML fragments. We demonstrate that the trained MRM is able to bring out the strengths of the XRs in order to adapt different preferences and queries.

[1]  Mounia Lalmas,et al.  A survey on the use of relevance feedback for information access systems , 2003, The Knowledge Engineering Review.

[2]  Filip Radlinski,et al.  Evaluating the accuracy of implicit feedback from clicks and query reformulations in Web search , 2007, TOIS.

[3]  Laks V. S. Lakshmanan,et al.  FleXPath: flexible structure and full-text querying for XML , 2004, SIGMOD '04.

[4]  Scott Boag,et al.  XQuery 1.0 : An XML Query Language , 2007 .

[5]  Gerhard Weikum,et al.  An Efficient and Versatile Query Engine for TopX Search , 2005, VLDB.

[6]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[7]  Tie-Yan Liu,et al.  Adapting ranking SVM to document retrieval , 2006, SIGIR.

[8]  Ricardo A. López,et al.  Computer networks. A top-down approach featuring Internet, second edition , 2007 .

[9]  Andrew Trotman,et al.  Focused Access to XML Documents: 6th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2007, Schloss Dagstuhl, Germany , 2008 .

[10]  Filip Radlinski,et al.  Evaluating the Robustness of Learning from Implicit Feedback , 2006, ArXiv.

[11]  Jim Kurose,et al.  Study companion, Computer networking, a top-down approach featuring the Internet, third edition, James F. Kurose, Keith W. Ross , 2007 .

[12]  Gabriella Kazai Initiative for the Evaluation of XML Retrieval , 2009 .

[13]  Thore Graepel,et al.  Large Margin Rank Boundaries for Ordinal Regression , 2000 .

[14]  Sihem Amer-Yahia,et al.  Flexible and efficient XML search with complex full-text predicates , 2006, SIGMOD Conference.

[15]  Lin Guo XRANK : Ranked Keyword Search over XML Documents , 2003 .

[16]  Ralf Schenkel,et al.  Feedback-Driven Structural Query Expansion for Ranked Retrieval of XML Data , 2006, EDBT.

[17]  Gerhard Weikum,et al.  The Index-Based XXL Search Engine for Querying XML Data with Relevance Ranking , 2002, EDBT.

[18]  Xiaoli Li,et al.  Learning to Classify Texts Using Positive and Unlabeled Data , 2003, IJCAI.

[19]  Wilfred Ng,et al.  Applying Co-training to Clickthrough Data for Search Engine Adaptation , 2004, DASFAA.

[20]  Gabriella Kazai INitiative for the Evaluation of XML Retrieval , 2009, Encyclopedia of Database Systems.

[21]  Thorsten Joachims,et al.  Accurately Interpreting Clickthrough Data as Implicit Feedback , 2017 .

[22]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[23]  Andrew Trotman,et al.  Narrowed Extended XPath I (NEXI) , 2004, INEX.

[24]  Cong Yu,et al.  XQuery 1.0 and XPath 2.0 Full-Text , 2009, Encyclopedia of Database Systems.

[25]  Wilfred Ng,et al.  Spying Out Real User Preferences for Metasearch Engine Personalization. , 2004 .

[26]  William Stafford Noble,et al.  Support vector machine , 2013 .

[27]  Sihem Amer-Yahia,et al.  Texquery: a full-text search extension to xquery , 2004, WWW '04.

[28]  Ludovic Denoyer,et al.  The XML Wikipedia Corpus , 2006 .

[29]  Yosi Mass,et al.  Using the INEX Environment as a Test Bed for Various User Models for XML Retrieval , 2005, INEX.

[30]  Filip Radlinski,et al.  Query chains: learning to rank from implicit feedback , 2005, KDD '05.

[31]  Thomas S. Huang,et al.  Classification Approach towards Banking and Sorting Problems , 2003, ECML.

[32]  Djoerd Hiemstra,et al.  TIJAH at INEX 2004 Modeling Phrases and Relevance Feedback , 2004, INEX.

[33]  Andrew McCallum,et al.  A comparison of event models for naive bayes text classification , 1998, AAAI 1998.

[34]  Thorsten Joachims,et al.  Eye-tracking analysis of user behavior in WWW search , 2004, SIGIR '04.

[35]  Goldberg,et al.  Genetic algorithms , 1993, Robust Control Systems with Genetic Algorithms.

[36]  Cong Yu,et al.  Schema-Free XQuery , 2004, VLDB.

[37]  David J. DeWitt,et al.  Mixed Mode XML Query Processing , 2003, VLDB.

[38]  Sihem Amer-Yahia,et al.  XML search: languages, INEX and scoring , 2006, SGMD.

[39]  Andrew Trotman,et al.  The Interpretation of CAS , 2005, INEX.

[40]  Susan T. Dumais,et al.  Learning user interaction models for predicting web search result preferences , 2006, SIGIR.

[41]  David Carmel,et al.  Searching XML documents via XML fragments , 2003, SIGIR.

[42]  Eric Brill,et al.  Improving web search ranking by incorporating user behavior information , 2006, SIGIR.

[43]  Yosi Mass,et al.  Relevance Feedback for XML Retrieval , 2004, INEX.

[44]  Patrick Brézillon,et al.  Lecture Notes in Artificial Intelligence , 1999 .

[45]  Yehoshua Sagiv,et al.  XSEarch: A Semantic Search Engine for XML , 2003, VLDB.

[46]  Koby Crammer,et al.  Pranking with Ranking , 2001, NIPS.

[47]  Kotagiri Ramamohanarao,et al.  Long-Term Learning for Web Search Engines , 2002, PKDD.

[48]  Ludovic Denoyer,et al.  The Wikipedia XML Corpus , 2006, INEX.

[49]  Yoram Singer,et al.  Learning to Order Things , 1997, NIPS.

[50]  Norbert Fuhr,et al.  Optimum polynomial retrieval functions based on the probability ranking principle , 1989, TOIS.

[51]  Alexander J. Smola,et al.  Advances in Large Margin Classifiers , 2000 .

[52]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[53]  Norbert Fuhr,et al.  Probabilistic Models in Information Retrieval , 1992, Comput. J..

[54]  Philip S. Yu,et al.  Partially Supervised Classification of Text Documents , 2002, ICML.

[55]  Ayhan Demiriz,et al.  Semi-Supervised Support Vector Machines , 1998, NIPS.

[56]  Sargur N. Srihari,et al.  Decision Combination in Multiple Classifier Systems , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[57]  Sihem Amer-Yahia,et al.  Structure and Content Scoring for XML , 2005, VLDB.

[58]  Qiang Yang,et al.  Test-cost sensitive naive Bayes classification , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).