Self-adaptive, on-line reclustering of complex object data

A likely trend in the development of future CAD, CASE and office information systems will be the use of object-oriented database systems to manage their internal data stores. The entities that these applications will retrieve, such as electronic parts and their connections or customer service records, are typically large complex objects composed of many interconnected heterogeneous objects, not thousands of tuples. These applications may exhibit widely shifting usage patterns due to their interactive mode of operation. Such a class of applications would demand clustering methods that are appropriate for clustering large complex objects and that can adapt on-line to the shifting usage patterns. While most object-oriented clustering methods allow grouping of heterogeneous objects, they are usually static and can only be changed off-line. We present one possible architecture for performing complex object reclustering in an on-line manner that is adaptive to changing usage patterns. Our architecture involves the decomposition of a clustering method into concurrently operating components that each handle one of the fundamental tasks involved in reclustering, namely statistics collection, cluster analysis, and reorganization. We present the results of an experiment performed to evaluate its behavior. These results show that the average miss rate for object accesses can be effectively reduced using a combination of rules that we have developed for deciding when cluster analyses and reorganizations should be performed.

[1]  Edward Omiecinski Incremental File Reorganization Schemes , 1985, VLDB.

[2]  Ellis Horowitz,et al.  Fundamentals of Computer Algorithms , 1978 .

[3]  Alexander Thomasian Performance limits of two-phase locking , 1991, [1991] Proceedings. Seventh International Conference on Data Engineering.

[4]  Donald E. Knuth,et al.  The Art of Computer Programming: Volume 3: Sorting and Searching , 1998 .

[5]  Mischa Schwartz,et al.  Telecommunication networks: protocols, modeling and analysis , 1986 .

[6]  Yuen Ren Chao,et al.  Human Behavior and the Principle of Least Effort: An Introduction to Human Ecology , 1950 .

[7]  Raj Jain,et al.  The art of computer systems performance analysis - techniques for experimental design, measurement, simulation, and modeling , 1991, Wiley professional computing.

[8]  George Kingsley Zipf,et al.  Human Behaviour and the Principle of Least Effort: an Introduction to Human Ecology , 2012 .

[9]  Mario Schkolnick,et al.  A clustering algorithm for hierarchical structures , 1977, TODS.

[10]  Henry Lieberman,et al.  A real-time garbage collector based on the lifetimes of objects , 1983, CACM.

[11]  Clement T. Yu,et al.  Adaptive record clustering , 1985, TODS.

[12]  J. Banerjee,et al.  Clustering a DAG for CAD Databases , 1988, IEEE Trans. Software Eng..

[13]  Roger King,et al.  The Performance and Utility of the Cactis Implementation Algorithms , 1990, VLDB.

[14]  François Bancilhon,et al.  Building an Object-Oriented Database System, The Story of O2 , 1992 .

[15]  Ali R. Hurson,et al.  Effective clustering of complex objects in object-oriented databases , 1991, SIGMOD '91.

[16]  Jeffrey F. Naughton,et al.  A stochastic approach for clustering in object bases , 1991, SIGMOD '91.

[17]  Roger King,et al.  Cactis: a self-adaptive, concurrent implementation of an object-oriented database management system , 1989, ACM Trans. Database Syst..

[18]  Jim Gray,et al.  The 5 minute rule for trading memory for disc accesses and the 10 byte rule for trading memory for CPU time , 1987, SIGMOD '87.