SCALABLE KNOWLEDGE DISCOVERY FROM OCEANOGRAPHIC DATA

Knowledge discovery from large acoustic images is a computationally intensive task. An approach that has proven successful for parallelizing supervised learning algorithms has been to partition data and distribute it to multiple processors, each running a learning algorithm. Then some sort of voting scheme or tree construction technique is used to combine results of the classifiers in order to predict the class of an instance. Systems built using such an approach have proven to be effective both in reducing computation time and in yielding better classification results. We have developed a technique for using this approach to parallelize unsupervised learning tasks. The process is more complicated for unsupervised learning, because one must determine a correspondence between classes learned by the different classifiers and determine how to combine the classes. We report preliminary results from using this approach for knowledge discovery with large acoustic images where the number of instances to be classified is greater than 10,000 and scalable knowledge discovery is a very important issue.