Towards Content-Related Indexing in Databases

Modern business appliications require huge volumes of highdimensioinal data to be Stored. explorative queries, typically used in these applications, select groups of objects with similar attributes or attribute combinations. In contrast to multidimensional index structures designed for spatial data that assume dimension independence and very often a uniform distribution, we have developed a new database indexing concept that discovers correlation patterns and takes the nonuniform distribution into consideration. The corresponding analysis is done on the subsymbolic level by applying a hierarchical artificial neural network. The trained neural network organises the data into a hierarchy of clusters. The clusters can be interpreted as groups of similar objects on the symbolic level. The hierarchy is finally used to derive the Intelligent Cluster Index (ICIx). In this paper we present a description of the Intelligent Cluster Index, it’s creation and application as multidimensional index and as heuristic for a logical distribution schema. We describe first experimental results, showing that this new approach can significantly speed up the system performance.

[1]  Antonin Guttman,et al.  R-trees: a dynamic index structure for spatial searching , 1984, SIGMOD '84.

[2]  Hans-Werner Six,et al.  Anbindung einer rämlich clusternden Zugriffstruktur für geometrische Attribute an ein Standard-Datenbanksystem am Beispiel von Oracle , 1991, BTW.

[3]  Elisa Bertino,et al.  Indexing Techniques for Advanced Database Systems , 1997, The Springer International Series on Advances in Database Systems.

[4]  Jonathan Goldstein,et al.  When Is ''Nearest Neighbor'' Meaningful? , 1999, ICDT.

[5]  Jürgen Rahmel,et al.  SplitNet: learning of tree structured Kohonen chains , 1996, Proceedings of International Conference on Neural Networks (ICNN'96).

[6]  Dieter Merkl,et al.  Exploration of text collections with hierarchical feature maps , 1997, SIGIR '97.

[7]  V. Burzevski,et al.  Hierarchical growing cell structures , 1996, Proceedings of International Conference on Neural Networks (ICNN'96).

[8]  Teuvo Kohonen,et al.  Self-Organization of Very Large Document Collections: State of the Art , 1998 .

[9]  Wolfgang Benn,et al.  Access to distributed environmental databases with ICIx technology , 2000, Online Inf. Rev..

[10]  Masatoshi Yoshikawa,et al.  The A-tree: An Index Structure for High-Dimensional Spaces Using Relative Approximation , 2000, VLDB.

[11]  Chris Clifton,et al.  Semantic Integration in Heterogeneous Databases Using Neural Networks , 1994, VLDB.

[12]  Hans-Peter Kriegel,et al.  The X-tree : An Index Structure for High-Dimensional Data , 2001, VLDB.

[13]  Bernd Fritzke,et al.  A Growing Neural Gas Network Learns Topologies , 1994, NIPS.

[14]  Peter Scheuermann,et al.  Multidatabase query processing with uncertainty in global keys and attribute values , 1998 .

[15]  Hans-Werner Six,et al.  How to Split Buckets in Spatial Data Structures , 1992 .

[16]  Shin'ichi Satoh,et al.  SR‐tree: An index structure for nearest‐neighbor searching of high‐dimensional point data , 1997 .

[17]  Jürg Nievergelt,et al.  The Grid File: An Adaptable, Symmetric Multikey File Structure , 1984, TODS.

[18]  Wolfgang Benn,et al.  Semantic Navigation Maps for Information Agents , 1998, CIA.

[19]  Jürgen Rahmel,et al.  On the Role of Topology for Neural Network Interpretation , 1996, ECAI.

[20]  Hans-Peter Kriegel,et al.  The R*-tree: an efficient and robust access method for points and rectangles , 1990, SIGMOD '90.

[21]  Christian Böhm,et al.  Independent quantization: an index compression technique for high-dimensional data spaces , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[22]  Peter Dadam,et al.  Verteilte Datenbanken und Client/Server-Systeme , 1996 .