JTool: Accessing Warehoused Collections of Objects with Java1

The purpose of the work describe here is to gain experimental experience with data warehouses for large collections of Java objects. We report on the design, architecture, and early experimental work with a software tool called JTool for creating data warehouses of Java objects. Our primary interest is in building distributed data warehouses containing large collections of Java objects as a basis for the data mining of objects on the web. This work is broadly based upon our prior work with a software called PTool which we have used for the data mining of large collections of C++ objects in clustered computing environments [Grossman 1996 and 1997a]. 1 This research was supported by Grants from the National Science Foundation and the Department of Energy. 2 Robert Grossman is a also a member of the technical staff at Magnify, Inc. With Version 0.2 of JTool, we have built Gigabyte size data warehouses of Java objects and showed that JTool scales linearly with the size of the warehouse and the size and complexity of the underlying objects. Unfortunately, due to the overhead of the Java Virtual Machine and to our use of object serialization supported by JDK 1.1.1, querying a gigabyte warehouse of Java objects takes approx imately 15 hours (vs minutes using PTool).