Introduction to the special issue on database and information retrieval integration

The goal of having a common platform for dealing with both structured and unstructured data is a longstanding one, going back to the 1960s. A number of approaches have been suggested, both from the database and information retrieval (IR) perspective, but the motivation for finding a solution or solutions that work has grown tremendously since the advent of very large-scale Web databases. Areas that were once the exclusive concerns of IR such as statistical inference and ranking, have now become important topics for database researchers and both communities have a common interest in providing efficient indexing and optimization techniques for Web-scale data. Exploiting document structure is a critical part of Web search and combining different sources of evidence effectively is an important part of many database applications. There are many possibilities for integration such as extending a database model to more effectively deal with probabilities, extending an IR model to handle more complex structures and multiple relations, or developing a unified model and system. Applications such as Web search, e-commerce, and data mining, provide the testbeds where these proposals can be evaluated and compared. The papers in this special issue cover a range of topics related to database and IR integration. To provide some context, it is worth briefly reviewing some of the work that was done in the past, particularly in the more distant pre-Web days. From an IR perspective, dealing with structure started in the 1970s with commercial search services such as MEDLINE and DIALOG that had Boolean field restrictions.

[1]  Michael Stonebraker,et al.  Extending a database system with procedures , 1987, TODS.

[2]  Hans-Jörg Schek,et al.  Data Structures for an Integrated Data Base Management and Information Retrieval System , 1982, VLDB.

[3]  Norbert Fuhr,et al.  A probabilistic relational algebra for the integration of information retrieval and database systems , 1997, TOIS.

[4]  W. Bruce Croft,et al.  A loosely-coupled integration of a text retrieval system and an object-oriented database system , 1992, SIGIR '92.

[5]  Hans-Jörg Schek,et al.  Nested Transactions in a Combined IRS-DBMS Architecture , 1984, SIGIR.

[6]  JUSTIN ZOBEL,et al.  Inverted files for text search engines , 2006, CSUR.

[7]  Peter Dadam,et al.  A DBMS prototype to support extended NF2 relations: an integrated view on flat tables and hierarchies , 1986, SIGMOD '86.

[8]  Tobias Bjerregaard,et al.  A survey of research and practices of Network-on-chip , 2006, CSUR.

[9]  Hans-Jörg Schek,et al.  Text Search Using Database Systems Revisited - Some Experiments , 1995, BNCOD.

[10]  Norbert Fuhr,et al.  A Probabilistic Framework for Vague Queries and Imprecise Information in Databases , 1990, VLDB.

[11]  Robert G. Crawford The relational model in information retrieval , 1981, J. Am. Soc. Inf. Sci..

[12]  W. Bruce Croft,et al.  Retrieval of Complex Objects , 1992, EDBT.

[13]  Peter Dadam,et al.  The Advanced Information Management Prototype , 1987, NF².

[14]  Ian A. Macleod,et al.  SEQUEL as a Language for Document Retrieval , 2007, J. Am. Soc. Inf. Sci..

[15]  Gerhard Weikum,et al.  Integrating DB and IR Technologies: What is the Sound of One Hand Clapping? , 2005, CIDR.

[16]  Ralf Hartmut Güting,et al.  An algebra for structured office documents , 1989, TOIS.

[17]  Hans-Jörg Schek,et al.  PowerDB-IR – Scalable Information Retrieval and Storage with a Cluster of Databases , 2004, Knowledge and Information Systems.

[18]  Hans-Jörg Schek,et al.  A Predicate Oriented Locking Approach for Integrated Information Systems , 1983, IFIP Congress.

[19]  Gerhard Weikum DB&IR: both sides now , 2007, SIGMOD '07.

[20]  Ophir Frieder,et al.  Integrating structured data and text: a relational approach , 1997 .

[21]  Sihem Amer-Yahia,et al.  Report on the DB/IR panel at SIGMOD 2005 , 2005, SGMD.

[22]  Hans-Jörg Schek,et al.  Methods for the administration of textual data in database systems , 1980, SIGIR '80.

[23]  W. Bruce Croft,et al.  Interactive retrieval of complex documents , 1990, Inf. Process. Manag..

[24]  Norbert Fuhr Probabilistic Datalog: implementing logical information retrieval for advanced applications , 2000 .

[25]  Ralf Hartmut Güting,et al.  An Introduction to the Nested Sequences of Tuples Data Model and Algebra , 1987, NF².

[26]  Vijay V. Raghavan,et al.  Design of an Integrated Information Retrieval/Database Management System , 1990, IEEE Trans. Knowl. Data Eng..

[27]  Horst Biller,et al.  On the Architecture of a System Integrating Data Base Management and Information Retrieval , 1982, SIGIR.

[28]  W. Bruce Croft,et al.  Supporting Full-Text Information Retrieval with a Persistent Object Store , 1994, EDBT.

[29]  Hans-Jörg Schek,et al.  Architectural Issues of Transaction Management in Multi-Layered Systems , 1984, VLDB.

[30]  Vijay V. Raghavan,et al.  Integration of information retrieval and database management systems , 1988, Inf. Process. Manag..

[31]  W. Bruce Croft,et al.  Integrating IR and RDBMS using cooperative indexing , 1995, SIGIR '95.

[32]  Serge Abiteboul,et al.  Nested Relations and Complex Objects in Databases , 1989, Lecture Notes in Computer Science.

[33]  David A. Grossman,et al.  Using the Relational Model and Part-of-Speech Tagging to Implement Text Relevance , 1992 .