Rewrite rules for search database systems

The results of a search engine can be improved by consulting auxiliary data. In a search database system, the association between the user query and the auxiliary data is driven by rewrite rules that augment the user query with a set of alternative queries. This paper develops a framework that formalizes the notion of a rewrite program, which is essentially a collection of hedge-rewriting rules. When applied to a search query, the rewrite program produces a set of alternative queries that constitutes a least fixpoint (lfp). The main focus of the paper is on the lfp-convergence of a rewrite program, where a rewrite program is lfp-convergent if the least fixpoint of every search query is finite. Determining whether a given rewrite program is lfp-convergent is undecidable; to accommodate that, the paper proposes a safety condition, and shows that safety guarantees lfp-convergence, and that safety can be decided in polynomial time. The effectiveness of the safety condition in capturing lfp-convergence is illustrated by an application to a rewrite program in an implemented system that is intended for widespread use.

[1]  Frank Neven,et al.  Automata theory for XML researchers , 2002, SGMD.

[2]  Sriram Raghavan,et al.  Navigating the intranet with high precision , 2007, WWW '07.

[3]  Michaël Rusinowitch,et al.  Closure of Hedge-Automata Languages by Hedge Rewriting , 2008, RTA.

[4]  John Bear,et al.  Using Information Extraction to Improve Document Retrieval , 1998, TREC.

[5]  Andrei Voronkov,et al.  Orienting rewrite rules with the Knuth-Bendix order , 2003, Inf. Comput..

[6]  Nachum Dershowitz,et al.  Orderings for term-rewriting systems , 1979, 20th Annual Symposium on Foundations of Computer Science (sfcs 1979).

[7]  Marti A. Hearst Direction-based text interpretation as an information access refinement , 1992 .

[8]  Eser Kandogan,et al.  Avatar semantic search: a database approach to information retrieval , 2006, SIGMOD Conference.

[9]  Sriram Raghavan,et al.  Understanding queries in a search database system , 2010, PODS '10.

[10]  Pierre Lescanne,et al.  Termination of Rewriting Systems by Polynomial Interpretations and Its Implementation , 1987, Sci. Comput. Program..

[11]  Hélène Kirchner,et al.  A Proof of Weak Termination Providing the Right Way to Terminate , 2004, ICTAC.

[12]  Paul S. Jacobs,et al.  Introduction: text power and intelligent systems , 1992 .

[13]  Jürgen Giesl,et al.  Termination of term rewriting using dependency pairs , 2000, Theor. Comput. Sci..

[14]  Deepak Kapur,et al.  On Proving Uniform Termination and Restricted Termination of Rewriting Systems , 1983, SIAM J. Comput..

[15]  Dénes König,et al.  Theorie der endlichen und unendlichen Graphen : kombinatorische Topologie der Streckenkomplexe , 1935 .

[16]  Bo Pang,et al.  Using Very Simple Statistics for Review Search: An Exploration , 2008, COLING.

[17]  Frederick Reiss,et al.  An Algebraic Approach to Rule-Based Information Extraction , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[18]  Zhendong Niu,et al.  Concept Based Query Expansion , 2013, 2013 Ninth International Conference on Semantics, Knowledge and Grids.

[19]  Salvador Lucas Context-Sensitive Computations in Confluent Programs , 1996, PLILP.

[20]  Nachum Dershowitz Orderings for Term-Rewriting Systems , 1979, FOCS.

[21]  Frederick Reiss,et al.  SystemT: a system for declarative information extraction , 2009, SGMD.

[22]  Donald E. Knuth,et al.  Simple Word Problems in Universal Algebras††The work reported in this paper was supported in part by the U.S. Office of Naval Research. , 1970 .

[23]  David D. Lewis Text representation for intelligent text retrieval: a classification-oriented view , 1992 .

[24]  Stéphane Kaplan,et al.  Conditional Rewrite Rules , 1984, Theor. Comput. Sci..