Word proximity constraints: information retrieval meets temporal reasoning

We study the data models WP and AWP that have been widely used for many years in the area of information retrieval. WP and AWP can be used to represent and query textual information under the Boolean model using the concepts of attributes with values of type text, and word proximity constraints. Variations of WP and AWP are in use in most deployed digital libraries using the Boolean model, text extenders for relational database systems (e.g., Oracle text) and the search engine Altavista. We present the syntax, semantics and model theory of WP and AWP and analyze the complexity of query satisfiability and entailment. Since word proximity constraints are very similar to temporal constraints, the techniques we use in our analysis are similar to the ones developed in previous work on first-order theories of temporal constraints and temporal constraint databases.

[1]  Manolis Koubarakis,et al.  Data Models and Languages for Agent-Based Textual Information Dissemination , 2002, CIA.

[2]  Manolis Koubarakis,et al.  Query Processing in Super-Peer Networks with Languages Based on Information Retrieval: The P2P-DIET Approach , 2004, EDBT Workshops.

[3]  Kevin Chen-Chuan Chang,et al.  Predicate rewriting for translating Boolean queries in a heterogeneous information system , 1999, TOIS.

[4]  Ricardo A. Baeza-Yates,et al.  Proximal nodes: a model to query document databases by content and structure , 1997, TOIS.

[5]  Manolis Koubarakis,et al.  Filtering algorithms for information retrieval models with named attributes and proximity operators , 2004, SIGIR '04.

[6]  Gabriel M. Kuper,et al.  Constraint query languages (preliminary report) , 1990, PODS '90.

[7]  Gabriel M. Kuper,et al.  Constraint Query Languages , 1995, J. Comput. Syst. Sci..

[8]  Kevin Chen-Chuan Chang,et al.  Query and data mapping across heterogeneous information sources , 2001 .

[9]  Manolis Koubarakis,et al.  Information Alert in Distributed Digital Libraries: The Models, Languages, and Architecture of DIAS , 2002, ECDL.

[10]  Manolis Koubarakis,et al.  Selective information dissemination in P2P networks: problems and solutions , 2003, SGMD.

[11]  Kevin Chen-Chuan Chang,et al.  Boolean Query Mapping Across Heterogeneous Information Sources , 1996, IEEE Trans. Knowl. Data Eng..

[12]  Sihem Amer-Yahia,et al.  Phrase Matching in XML , 2003, VLDB.

[13]  Manolis Koubarakis,et al.  P2P-DIET: an extensible P2P service that unifies ad-hoc and continuous querying in super-peer networks , 2004, SIGMOD '04.