MIRE: a multidimensional information retrieval engine for structured data and text

This paper presents an original information-retrieval engine, called MIRE, for integrating structured data and text. Among other things, MIRE is designed to work in a natural and efficient way with the inherent hierarchies of structured data. While multidimensional access methods have originally been developed for spatial applications, they can be successfully used to index hierarchical structured data and add to an existing information retrieval engine the capability of navigating hierarchical dimensions. To support this capability, MIRE enhances the processing algorithms of an existing multidimensional access method to avoid overflow and support for hierarchical dimensions. Compared to a search engine with multiple indexes for a different type of search, the multidimensional approach shows a significant reduction in the number of page accesses over a large document collection.

[1]  Michael Freeston A general solution of the n-dimensional B-tree problem , 1995, SIGMOD '95.

[2]  Hugo Zaragoza,et al.  Information Retrieval: Algorithms and Heuristics , 2002, Information Retrieval.

[3]  J. T. Robinson,et al.  The K-D-B-tree: a search structure for large multidimensional dynamic indexes , 1981, SIGMOD '81.

[4]  Ophir Frieder,et al.  Integrating structured data and text: a relational approach , 1997 .

[5]  Gerard Salton,et al.  A vector space model for automatic indexing , 1975, CACM.

[6]  Peter Widmayer,et al.  The LSD tree: spatial access to multidimensional and non-point objects , 1989, VLDB 1989.

[7]  Jinho Lee,et al.  On the design and evaluation of a multi-dimensional approach to information retrieval (poster session) , 2000, Annual International ACM SIGIR Conference on Research and Development in Information Retrieval.

[8]  Ellen M. Voorhees,et al.  Using WordNet to disambiguate word senses for text retrieval , 1993, SIGIR.

[9]  Hans-Peter Kriegel,et al.  The Buddy-Tree: An Efficient and Robust Access Method for Spatial Data Base Systems , 1990, VLDB.

[10]  R. Orlandic,et al.  Implementing KDB-trees to support high-dimensional data , 2001, Proceedings 2001 International Database Engineering and Applications Symposium.

[11]  Shin'ichi Satoh,et al.  The SR-tree: an index structure for high-dimensional nearest neighbor queries , 1997, SIGMOD '97.

[12]  Torsten Grabs,et al.  A document engine on a db cluster , 1999 .

[13]  Michael Stonebraker,et al.  Extended User-Defined Indexing with Application to Textual Databases , 1988, VLDB.

[14]  Manuel Palomar,et al.  An object oriented approach to multidimensional database conceptual modeling (OOMD) , 1998, DOLAP '98.

[15]  Hans-Werner Six,et al.  The LSD tree: Spatial Access to Multidimensional Point and Nonpoint Objects , 1989, VLDB.

[16]  Andreas Henrich,et al.  Adapting a spatial access structure for document representations in vector space , 1996, CIKM '96.

[17]  Paul Douglas,et al.  Proceedings International Conference on Information Technology: Coding and Computing , 2002, Proceedings. International Conference on Information Technology: Coding and Computing.

[18]  Ramesh C. Jain,et al.  Similarity indexing with the SS-tree , 1996, Proceedings of the Twelfth International Conference on Data Engineering.

[19]  Ophir Frieder,et al.  Integrating Structured Data and Text: A Relational Approach , 1997, J. Am. Soc. Inf. Sci..

[20]  Oliver Günther,et al.  Multidimensional access methods , 1998, CSUR.

[21]  Ian A. Macleod,et al.  SEQUEL as a Language for Document Retrieval , 2007, J. Am. Soc. Inf. Sci..

[22]  Hans-Jörg Schek,et al.  High-level parallelisation in a database cluster: a feasibility study using document services , 2001, Proceedings 17th International Conference on Data Engineering.

[23]  Ophir Frieder,et al.  Integrating Structured Data and Text , 2004 .

[24]  Jeffrey D. Ullman,et al.  Implementing data cubes efficiently , 1996, SIGMOD '96.