TopX: efficient and versatile top-k query processing for text, structured, and semistructured data

This paper presents a comprehensive overview of the TopX search engine, an extensive framework for unified indexing and querying large collections of unstructured, semistructured, and structured data. Residing at the very synapse of database (DB) engineering and information retrieval (IR), it integrates efficient scheduling algorithms for top-k-style ranked retrieval with powerful scoring models, as well as dynamic and self-throttling query expansion facilities.

[1]  Luis Gravano,et al.  Evaluating top-k queries over web-accessible databases , 2004, TODS.

[2]  Randolph D. Nelson,et al.  Probability, stochastic processes, and queueing theory - the mathematics of computer performance modeling , 1995 .

[3]  Stephen E. Robertson,et al.  Relevance weighting of search terms , 1976, J. Am. Soc. Inf. Sci..

[4]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[5]  Ioana Manolescu,et al.  XMark: A Benchmark for XML Data Management , 2002, VLDB.

[6]  David J. DeWitt,et al.  On supporting containment queries in relational database management systems , 2001, SIGMOD '01.

[7]  Gerhard Weikum,et al.  Adding Relevance to XML , 2000, WebDB.

[8]  Ralf Schenkel,et al.  Feedback-Driven Structural Query Expansion for Ranked Retrieval of XML Data , 2006, EDBT.

[9]  Clement T. Yu,et al.  An effective approach to document retrieval via utilizing WordNet and recognizing phrases , 2004, SIGIR '04.

[10]  Mounia Lalmas,et al.  Advances in XML Information Retrieval, Third International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2004, Dagstuhl Castle, Germany, December 6-8, 2004, Revised Selected Papers , 2005, INEX.

[11]  William H. Press,et al.  Numerical recipes in C , 2002 .

[12]  Alexander Dekhtyar,et al.  Information Retrieval , 2018, Lecture Notes in Computer Science.

[13]  Aravind Srinivasan,et al.  Chernoff-Hoeffding bounds for applications with limited independence , 1995, SODA '93.

[14]  Justin Zobel,et al.  Techniques for Efficient Query Expansion , 2004, SPIRE.

[15]  Stephen E. Robertson,et al.  Okapi at TREC-4 , 1995, TREC.

[16]  Ralf Schenkel,et al.  Structural Feedback for Keyword-Based XML Retrieval , 2006, ECIR.

[17]  Ronald Fagin,et al.  Combining Fuzzy Information from Multiple Systems , 1999, J. Comput. Syst. Sci..

[18]  David Hawking,et al.  Proximity Operators - So Near And Yet So Far , 1995, TREC.

[19]  Christos Faloutsos,et al.  On packing R-trees , 1993, CIKM '93.

[20]  Leonidas Fegaras XQuery Processing with Relevance Ranking , 2004, XSym.

[21]  Vassilis J. Tsotras,et al.  Twig query processing over graph-structured XML data , 2004, WebDB '04.

[22]  Torsten Grust,et al.  Staircase Join: Teach a Relational DBMS to Watch its (Axis) Steps , 2003, VLDB.

[23]  Gerhard Weikum,et al.  IO-Top-k: index-access optimized top-k query processing , 2006, VLDB.

[24]  Laurent Amsaleg,et al.  Cost-based query scrambling for initial delays , 1998, SIGMOD '98.

[25]  Hans-Peter Frei,et al.  Concept based query expansion , 1993, SIGIR.

[26]  Gerhard Weikum,et al.  Efficient and self-tuning incremental query expansion for top-k query processing , 2005, SIGIR '05.

[27]  Yen-Jen Oyang,et al.  Relevant term suggestion in interactive web search based on contextual information in query session logs , 2003, J. Assoc. Inf. Sci. Technol..

[28]  Sihem Amer-Yahia,et al.  Adaptive processing of top-k queries in XML , 2005, 21st International Conference on Data Engineering (ICDE'05).

[29]  Kevin Chen-Chuan Chang,et al.  RankSQL: query algebra and optimization for relational top-k queries , 2005, SIGMOD '05.

[30]  Gerhard Weikum,et al.  BINGO!: bookmark-induced gathering of information , 2002, Proceedings of the Third International Conference on Web Information Systems Engineering, 2002. WISE 2002..

[31]  Dan Suciu,et al.  Index Structures for Path Expressions , 1999, ICDT.

[32]  Gerhard Weikum,et al.  Integrating DB and IR Technologies: What is the Sound of One Hand Clapping? , 2005, CIDR.

[33]  Surya Nepal,et al.  Query processing issues in image (multimedia) databases , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[34]  Ronald Fagin,et al.  Combining fuzzy information: an overview , 2002, SGMD.

[35]  Norbert Fuhr,et al.  Efficient processing of vague queries using a data stream approach , 1995, SIGIR '95.

[36]  Michalis Vazirgiannis,et al.  Semantic Distances for Sets of Senses and Applications in Word Sense Disambiguation , 2005 .

[37]  Jeffrey Scott Vitter,et al.  XPathLearner: An On-line Self-Tuning Markov Histogram for XML Path Selectivity Estimation , 2002, VLDB.

[38]  Chris Buckley,et al.  Improving automatic query expansion , 1998, SIGIR '98.

[39]  CHENGXIANG ZHAI,et al.  A study of smoothing methods for language models applied to information retrieval , 2004, TOIS.

[40]  Stephen E. Robertson,et al.  Term frequency and term value , 1981, SIGIR '81.

[41]  Yasushi Ogawa,et al.  The use of phrases from query texts in information retrieval (poster session) , 2000, SIGIR '00.

[42]  J. J. Rocchio,et al.  Relevance feedback in information retrieval , 1971 .

[43]  Torsten. Grust,et al.  Accelerating XPath location steps , 2002, SIGMOD '02.

[44]  Wolf-Tilo Balke,et al.  Towards efficient multi-feature queries in heterogeneous environments , 2001, Proceedings International Conference on Information Technology: Coding and Computing.

[45]  Hanan Samet,et al.  Index-driven similarity search in metric spaces (Survey Article) , 2003, TODS.

[46]  Clement T. Yu,et al.  Database selection for processing k nearest neighbors queries in distributed environments , 2001, JCDL '01.

[47]  Gerhard Weikum,et al.  An Efficient and Versatile Query Engine for TopX Search , 2005, VLDB.

[48]  R. Forthofer,et al.  Rank Correlation Methods , 1981 .

[49]  Hanan Samet,et al.  Distance browsing in spatial databases , 1999, TODS.

[50]  Hans-Jörg Schek,et al.  PowerDB-XML: Scalable XML Processing with a Database Cluster , 2003, Intelligent Search on XML Data.

[51]  Moni Naor,et al.  Optimal aggregation algorithms for middleware , 2001, PODS '01.

[52]  Laks V. S. Lakshmanan,et al.  TAX: A Tree Algebra for XML , 2001, DBPL.

[53]  Georg Gottlob,et al.  The complexity of XPath query evaluation , 2003, PODS.

[54]  Gerhard Weikum,et al.  Top-k Query Evaluation with Probabilistic Guarantees , 2004, VLDB.

[55]  Gerhard Weikum,et al.  Exploiting Structure, Annotation, and Ontological Knowledge for Automatic Classification of XML Data , 2003, WebDB.

[56]  Yannis E. Ioannidis,et al.  The History of Histograms (abridged) , 2003, VLDB.

[57]  Alistair Moffat,et al.  Self-indexing inverted files for fast text retrieval , 1996, TOIS.

[58]  Antonin Guttman,et al.  R-trees: a dynamic index structure for spatial searching , 1984, SIGMOD '84.

[59]  Stephen E. Robertson,et al.  On relevance weights with little relevance information , 1997, SIGIR '97.