A Common Access Structure for Standard Attributes and Document Representations in Vector Space

In next generation information systems there will be a coalescence of (object-oriented) database management systems and information retrieval systems. Especially the integration of content-based retrieval techniques into object-oriented database management systems is an interesting requirement in this respect. Whereas the aspects of this coalescence dealing with the query language and the data model have been addressed in some recent papers, approaches dealing with the integration at the physical level are missing. In the present paper we propose a common access structure which can support a content-based similarity search with additional conditions on standard attributes in one homogeneous step. To this end, we use a k-dtree based multi-attribute access structure which considers standard attributes in the rst dimensions and the components of a document description vector in the higher dimensions. We describe the algorithms for this access structure and present rst performance results.

[1]  Nick Roussopoulos,et al.  Nearest neighbor queries , 1995, SIGMOD '95.

[2]  Andreas Henrich Document retrieval facilities for repository-based system development environments , 1996, SIGIR '96.

[3]  Hans-Werner Six,et al.  The LSD tree: Spatial Access to Multidimensional Point and Nonpoint Objects , 1989, VLDB.

[4]  Gerard Salton,et al.  Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer , 1989 .

[5]  Andreas Henrich,et al.  Adapting a spatial access structure for document representations in vector space , 1996, CIKM '96.

[6]  Hanan Samet,et al.  Ranking in Spatial Databases , 1995, SSD.

[7]  Ramesh C. Jain,et al.  Similarity indexing with the SS-tree , 1996, Proceedings of the Twelfth International Conference on Data Engineering.

[8]  W. Bruce Croft,et al.  Supporting Full-Text Information Retrieval with a Persistent Object Store , 1994, EDBT.

[9]  Serge Abiteboul,et al.  From structured documents to novel query facilities , 1994, SIGMOD '94.

[10]  Hans-Peter Kriegel,et al.  The X-tree : An Index Structure for High-Dimensional Data , 2001, VLDB.

[11]  Peter Widmayer,et al.  The LSD tree: spatial access to multidimensional and non-point objects , 1989, VLDB 1989.

[12]  Carolyn J. Crouch,et al.  An approach to the automatic construction of global thesauri , 1990, Inf. Process. Manag..

[13]  Andreas Henrich,et al.  Extending a Spatial Access Structure to Support Additional Standard Attributes , 1995, SSD.

[14]  Klemens Böhm,et al.  Applying a flexible OODBMS-IRS-coupling to structured document handling , 1996, Proceedings of the Twelfth International Conference on Data Engineering.

[15]  Gerard Salton,et al.  A vector space model for automatic indexing , 1975, CACM.

[16]  Dario Lucarella A Search Strategy for Large Document Bases , 1988, Electron. Publ..

[17]  David Konopnicki,et al.  W3QS: A Query System for the World-Wide Web , 1995, VLDB.

[18]  Andreas Henrich A Distance Scan Algorithm for Spatial Access Structures , 1994, ACM-GIS.

[19]  Donna K. Harman,et al.  Overview of the Second Text REtrieval Conference (TREC-2) , 1994, HLT.

[20]  Jon Louis Bentley,et al.  Multidimensional binary search trees used for associative searching , 1975, CACM.

[21]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[22]  Chris Buckley,et al.  Optimization of inverted vector searches , 1985, SIGIR '85.