Distributed Multi-class Rule Based Classification Using RIPPER

Traditional data mining (DM) has certain challenges viz. Scalability, high dimensionality, distributed data and often it also requires huge amount of computational resources in terms of space and time to extract the hidden patterns in the data. In addition, the data has to be available at one location. But in today's era the data are often inherently distributed in several databases. Hence, due to the limited bandwidth, centralized processing of the data is highly inefficient. Therefore, distributed computing becomes very important for efficient DM, both in terms of space and time. This can be done by developing a mechanism to mine the massive data by applying DM in a non-centralized way that distributes the work load seamlessly among the available sites. Therefore, in this paper, we propose an algorithm distributed multi-class rule based classification (DiRUC) which implements repeated incremental pruning to produce error reduction (RIPPER) at local level and then merges into a global level in a distributed manner. The algorithm first constructs the local rule sets for the distributed data and then at each iteration the local models are sent from one location to other. Finally, the global model is constructed by efficiently merging these local models and is made available at each site for further prediction of the class labels. The performance (accuracy and efficiency) analysis of the algorithm is done for the five data sets with different parameters and the result shows that the proposed approach DiRUC outperforms the normal RIPPER and Ischibuchi et al. island model.

[1]  Hisao Ishibuchi,et al.  Ensemble Fuzzy Rule-Based Classifier Design by Parallel Distributed Fuzzy GBML Algorithms , 2012, SEAL.

[2]  Sun Jiang ' hong,et al.  Large Rotating Machinery Fault Diagnosis and Knowledge Rules Acquiring Based on Improved RIPPER , 2009 .

[3]  Alexander Löser,et al.  The GoOLAP Fact Retrieval Framework , 2011, eBISS.

[4]  Jianwei Guo,et al.  Research on Distributed Data Mining System Based on Hadoop Platform , 2014 .

[5]  Antonio Peregrín,et al.  An Evolutionary Ensemble-Based Method for Rule Extraction with Distributed Data , 2009, HAIS.

[6]  Viktor K. Prasanna,et al.  Scalable regression tree learning on Hadoop using OpenPlanet , 2012, MapReduce '12.

[7]  Haimonti Dutta,et al.  Distributed Top-K Outlier Detection from Astronomy Catalogs using the DEMAC System , 2007, SDM.

[8]  Vincent Cho,et al.  Distributed Mining of Classification Rules , 2002, Knowledge and Information Systems.

[9]  Abu Ahmed Ferdaus,et al.  A Genetic Algorithm Approach using Improved Fitness Function for Classification Rule Mining , 2014 .

[10]  Teng-Sheng Moh,et al.  Can You Judge a Man by His Friends? - Enhancing Spammer Detection on the Twitter Microblogging Platform Using Friends and Followers , 2010, ICISTM.

[11]  Marek Sikora,et al.  Data-Driven Adaptive Selection of Rules Quality Measures for Improving the Rules Induction Algorithm , 2011, RSFDGrC.

[12]  Vladimir A Basiuk,et al.  Self-assemblies of meso-tetraphenylporphine ligand on surfaces of highly oriented pyrolytic graphite and single-walled carbon nanotubes: insights from scanning tunneling microscopy and molecular modeling. , 2011, Journal of nanoscience and nanotechnology.

[13]  Telmo da Silva Morais Survey on Frameworks for Distributed Computing: Hadoop, Spark and Storm , 2015 .