S3Bagging: Fast Classifier Induction Method with Subsampling and Bagging

In the data mining process, it is often necessary to induce classifiers iteratively by the human analysts complete to extract valuable knowledge from data. Therefore, the data mining tools need to extract valid knowledge from a large amount of data quickly enough in response to the human demand. One of the approaches to answer this request is to reduce the training data size by subsampling. In many cases, the accuracy of the induced classifier becomes worse when the training data is subsampled. We propose S3 Bagging (Small SubSampled Bagging) that adopts both subsampling and a method of committee learning, i.e., Bagging. S3Bagging can induce classifier efficiently by reducing the training data size by subsampling and parallel processing. Additionally, the accuracy of the classifier is maintained by aggregating the result of each classifier through the Bagging process. The performance of S3 Bagging is investigated by carefully designed experiments.