A Note on Interfacing Object Warehouses and Mass Storage Systems for Data Mining Applications

Data mining is the automatic discovery of patterns, associations, and anomalies in data sets. Data mining requires numerically and statistically intensive queries. Our assumption is that data mining requires a specialized data management infrastructure to support the aforementioned intensive queries, but because of the sizes of data involved, this infrastructure is layered over a hierarchical storage system. In this paper, we discuss the architecture of a system which is layered for modularity, but exploits specialized lightweight services to maintain efficiency. Rather than use a full functioned database for example, we use light weight object services specialized for data mining. We propose using information repositories between layers so that components on either side of the layer can access information in the repositories to assist in making decisions about data layout, the caching and migration of data, the scheduling of queries, and related matters.

[1]  Richard R. Muntz,et al.  OASIS: an open architecture scientific information system , 1996, Proceedings RIDE '96. Sixth International Workshop on Research Issues in Data Engineering.

[2]  Robert L. Grossman,et al.  An architecture for a scalable, high-performance digital library , 1995, Proceedings of IEEE 14th Symposium on Mass Storage Systems.

[3]  Robert L. Grossman,et al.  Optimization driven data mining and credit scoring , 1996, IEEE/IAFE 1996 Conference on Computational Intelligence for Financial Engineering (CIFEr).

[4]  D. Watson,et al.  The architecture of the High Performance Storage System (HPSS) , 1995 .

[5]  Robert L. Grossman,et al.  Data Mining Using Light Weight Object Management in Clustered Computing Environments , 1996, POS.