Scaling the Data Mining Step in Knowledge Discovery Using Oceanographic Data

Knowledge discovery from large acoustic images is a computationally intensive task. The data-mining step in the knowledge discovery process that involves unsupervised learning (clustering) consumes the bulk of the computation. We have developed a technique that allows us to partition the data, distribute it to different processors for training, and train a single system to join the results of the independent categorizers. We report preliminary results using this approach for knowledge discovery with large acoustic images having more than 10, 000 training instances.

[1]  Jack Dongarra,et al.  MPI: The Complete Reference , 1996 .

[2]  Padhraic Smyth,et al.  From Data Mining to Knowledge Discovery: An Overview , 1996, Advances in Knowledge Discovery and Data Mining.

[3]  Richard R. Muntz,et al.  Scalable Exploratory Data Mining of Distributed Geoscientific Data , 1996, KDD.

[4]  Susan M. Bridges,et al.  Knowledge Discovery in an Object-Oriented Oceanographic Database System , 1997 .

[5]  Donald Karpovich Choosing the optimal features and texel sizes in image , 1998, ACM-SE 36.

[6]  Peter C. Cheeseman,et al.  Bayesian Classification (AutoClass): Theory and Results , 1996, Advances in Knowledge Discovery and Data Mining.

[7]  Miron Livny,et al.  Fast Density and Probability Estimation Using CF-Kernel Method for Very Large Databases , 1996 .

[8]  Bruce Wooley,et al.  Region-growing techniques based on texture for provincing the ocean floor , 1998, ACM-SE 36.

[9]  Rakesh Agrawal,et al.  Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining , 1998, KDD 1998.

[10]  Salvatore J. Stolfo,et al.  Learning Arbiter and Combiner Trees from Partitioned Data for Scaling Machine Learning , 1995, KDD.

[11]  T. Reed,et al.  Digital image processing techniques for enhancement and classification of SeaMARC II side scan sonar imagery , 1989 .

[12]  James Kelly,et al.  AutoClass: A Bayesian Classification System , 1993, ML.

[13]  Paul S. Bradley,et al.  Scaling Clustering Algorithms to Large Databases , 1998, KDD.