A parallel fuzzy rule-base based decision tree in the framework of map-reduce

Abstract Decision trees are commonly used for learning and extracting classification rules from data. The fuzzy rule based decision tree (FRDT) is very representative owing to its better robustness and generalization. However, FRDT cannot work well on the analysis of large-scale data sets. One solution for this problem is parallel computing. A proved effective parallel computing model is Map-Reduce. Ensemble learning is an effective strategy which can significantly improve the generalization ability of machine learning systems. The objective of this paper is to develop a fuzzy rule-base based decision tree on the strategies of parallel computing and ensemble learning. First, we implement a parallel fusing fuzzy rule based classification system via Map-Reduce (MR-FFRCS) to display how to extract fuzzy rules from data in parallel and how to evaluate the fuzzy rules in an ensemble learning way. Then, taking MR-FFRCS as a fundamental module, we propose a parallel fuzzy rule-base based decision tree (MR-FRBDT) to improve the original FRDT algorithm. The experimental studies mainly focus on feasibility and parallelism. Compared with FRDT on 23 UCI benchmark data sets, the proposed MR-FRBDT algorithm with fewer parameters is effective and has the ability to handle large-scale data sets. Furthermore, some numerical experiments conducted on several large-scale data sets verify the parallel performance on reducing computing time and avoiding memory restrictions.

[1]  Roberto J. Bayardo,et al.  PLANET: Massively Parallel Learning of Tree Ensembles with MapReduce , 2009, Proc. VLDB Endow..

[2]  Ashish Ghosh,et al.  Integration of deep feature extraction and ensemble learning for outlier detection , 2019, Pattern Recognit..

[3]  Yu-Lin He,et al.  Learning ELM-Tree from big data based on uncertainty reduction , 2015, Fuzzy Sets Syst..

[4]  Witold Pedrycz,et al.  Fuzzy rule based decision trees , 2015, Pattern Recognit..

[5]  Yael Ben-Haim,et al.  A Streaming Parallel Decision Tree Algorithm , 2010, J. Mach. Learn. Res..

[6]  Kazuyuki Murase,et al.  Quaternion neuro-fuzzy learning algorithm for generation of fuzzy rules , 2016, Neurocomputing.

[7]  Jerry M. Mendel,et al.  Generating fuzzy rules by learning from examples , 1992, IEEE Trans. Syst. Man Cybern..

[8]  Michela Antonelli,et al.  A distributed approach to multi-objective evolutionary generation of fuzzy rule-based classifiers from big data , 2017, Inf. Sci..

[9]  Fuzhen Zhuang,et al.  Parallel extreme learning machine for regression based on MapReduce , 2013, Neurocomputing.

[10]  Xiaodong Liu,et al.  A parallel tree node splitting criterion for fuzzy decision trees , 2019, Concurr. Comput. Pract. Exp..

[11]  Jiye Liang,et al.  Fusing Monotonic Decision Trees , 2015, IEEE Transactions on Knowledge and Data Engineering.

[12]  Xiaodong Liu,et al.  A parallel C4.5 decision tree algorithm based on MapReduce , 2017, Concurr. Comput. Pract. Exp..

[13]  Geoff Holmes,et al.  MOA: Massive Online Analysis , 2010, J. Mach. Learn. Res..

[14]  Wei Tang,et al.  Ensembling neural networks: Many could be better than all , 2002, Artif. Intell..

[15]  Witold Pedrycz,et al.  On Distributed Fuzzy Decision Trees for Big Data , 2018, IEEE Transactions on Fuzzy Systems.

[16]  Zhiying Wang,et al.  Wireless MapReduce Distributed Computing , 2019, IEEE Transactions on Information Theory.

[17]  Vipin Kumar,et al.  Parallel Formulations of Decision-Tree Classification Algorithms , 2004, Data Mining and Knowledge Discovery.

[18]  Witold Pedrycz,et al.  Extraction of fuzzy rules from fuzzy decision trees: An axiomatic fuzzy sets (AFS) approach , 2013, Data Knowl. Eng..

[19]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[20]  Vipin Kumar,et al.  ScalParC: a new scalable and efficient parallel classification algorithm for mining large datasets , 1998, Proceedings of the First Merged International Parallel Processing Symposium and Symposium on Parallel and Distributed Processing.

[21]  Guo Cao,et al.  A novel ensemble method for k-nearest neighbor , 2019, Pattern Recognit..

[22]  M. Zaharia,et al.  Spark: The Definitive Guide: Big Data Processing Made Simple , 2018 .

[23]  Emilio Corchado,et al.  A survey of multiple classifier systems as hybrid systems , 2014, Inf. Fusion.

[24]  Xiaodong Liu,et al.  A Pearson's correlation coefficient based decision tree and its parallel implementation , 2018, Inf. Sci..

[25]  Humberto Bustince,et al.  CHI-BD: A fuzzy rule-based classification system for Big Data classification problems , 2017, Fuzzy Sets Syst..

[26]  Yeong-Chyi Lee,et al.  fuzzy coherent rule mining algorithm , 2013 .

[27]  Naresh Manwani,et al.  Geometric Decision Tree , 2010, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[28]  Witold Pedrycz,et al.  Fuzzy classifiers with information granules in feature space and logic-based computing , 2018, Pattern Recognit..

[29]  Kilian Stoffel,et al.  A Parallel Approach for Decision Trees Learning from Big Data Streams , 2015, BIS.

[30]  Witold Pedrycz,et al.  Mining axiomatic fuzzy set association rules for classification problems , 2012, Eur. J. Oper. Res..

[31]  Pietro Ducange,et al.  A Distributed Fuzzy Associative Classifier for Big Data , 2018, IEEE Transactions on Cybernetics.

[32]  Anastasios Tefas,et al.  A distributed framework for trimmed Kernel k-Means clustering , 2015, Pattern Recognit..

[33]  Huseyin Ozkan,et al.  Nonlinear regression via incremental decision trees , 2019, Pattern Recognit..