Learning ELM-Tree from big data based on uncertainty reduction

A challenge in big data classification is the design of highly parallelized learning algorithms. One solution to this problem is applying parallel computation to different components of a learning model. In this paper, we first propose an extreme learning machine tree (ELM-Tree) model based on the heuristics of uncertainty reduction. In the ELM-Tree model, information entropy and ambiguity are used as the uncertainty measures for splitting decision tree (DT) nodes. Besides, in order to resolve the over-partitioning problem in the DT induction, ELMs are embedded as the leaf nodes when the gain ratios of all the available splits are smaller than a given threshold. Then, we apply parallel computation to five components of the ELM-Tree model, which effectively reduces the computational time for big data classification. Experimental studies demonstrate the effectiveness of the proposed method.

[1]  Piyush Malik,et al.  Governing Big Data: Principles and practices , 2013, IBM J. Res. Dev..

[2]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[3]  Hongming Zhou,et al.  Extreme Learning Machine for Regression and Multiclass Classification , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[4]  Chuen-Jyh Chen Structural vibration suppression by using neural classifier with genetic algorithm , 2012, Int. J. Mach. Learn. Cybern..

[5]  Yu-Lin He,et al.  Non-Naive Bayesian Classifiers for Classification Problems With Continuous Attributes , 2014, IEEE Transactions on Cybernetics.

[6]  J. Friedman Special Invited Paper-Additive logistic regression: A statistical view of boosting , 2000 .

[7]  Chee Kheong Siew,et al.  Extreme learning machine: Theory and applications , 2006, Neurocomputing.

[8]  Eibe Frank,et al.  Logistic Model Trees , 2003, Machine Learning.

[9]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[10]  Yong Sheng,et al.  A parallel decision tree-based method for user authentication based on keystroke patterns , 2005, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[11]  Robert F. Stengel,et al.  Smooth function approximation using neural networks , 2005, IEEE Transactions on Neural Networks.

[12]  Rakesh Agrawal,et al.  SPRINT: A Scalable Parallel Classifier for Data Mining , 1996, VLDB.

[13]  Sunil Vadera,et al.  A survey of cost-sensitive decision tree induction algorithms , 2013, CSUR.

[14]  H. Peter Hofstee,et al.  Big Data text-oriented benchmark creation for Hadoop , 2013, IBM J. Res. Dev..

[15]  João Gama,et al.  Functional Trees , 2001, Machine Learning.

[16]  G. Klir,et al.  MEASURES OF UNCERTAINTY AND INFORMATION BASED ON POSSIBILITY DISTRIBUTIONS , 1982 .

[17]  Ron Kohavi,et al.  Scaling Up the Accuracy of Naive-Bayes Classifiers: A Decision-Tree Hybrid , 1996, KDD.

[18]  Xizhao Wang,et al.  Induction of multiple fuzzy decision trees based on rough set technique , 2008, Inf. Sci..

[19]  Fuzhen Zhuang,et al.  Parallel extreme learning machine for regression based on MapReduce , 2013, Neurocomputing.

[20]  Eibe Frank,et al.  Speeding Up Logistic Model Tree Induction , 2005, PKDD.

[21]  Yong Wang,et al.  Using Model Trees for Classification , 1998, Machine Learning.

[22]  J. Ross Quinlan,et al.  Improved Use of Continuous Attributes in C4.5 , 1996, J. Artif. Intell. Res..

[23]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[24]  Yael Ben-Haim,et al.  A Streaming Parallel Decision Tree Algorithm , 2010, J. Mach. Learn. Res..

[25]  Dianhui Wang,et al.  Extreme learning machines: a survey , 2011, Int. J. Mach. Learn. Cybern..

[26]  Ian Witten,et al.  Data Mining , 2000 .

[27]  Xizhao Wang,et al.  A comparative study on heuristic algorithms for generating fuzzy decision trees , 2001, IEEE Trans. Syst. Man Cybern. Part B.

[28]  M. Shaw,et al.  Induction of fuzzy decision trees , 1995 .

[29]  Mohammed J. Zaki Parallel and distributed association mining: a survey , 1999, IEEE Concurr..

[30]  Divesh Srivastava,et al.  Big data integration , 2013, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[31]  Vipin Kumar,et al.  Parallel Formulations of Decision-Tree Classification Algorithms , 2004, Data Mining and Knowledge Discovery.

[32]  Songfeng Zheng,et al.  Gradient descent algorithms for quantile regression with smooth approximation , 2011, Int. J. Mach. Learn. Cybern..

[33]  Roberto J. Bayardo,et al.  PLANET: Massively Parallel Learning of Tree Ensembles with MapReduce , 2009, Proc. VLDB Endow..

[34]  Mohamad Khalil,et al.  Parameter selection algorithm with self adaptive growing neural network classifier for diagnosis issues , 2013, Int. J. Mach. Learn. Cybern..

[35]  Tao Wang,et al.  Handling over-fitting in test cost-sensitive decision tree learning by feature selection, smoothing and pruning , 2010, J. Syst. Softw..

[36]  Xizhao Wang,et al.  On the handling of fuzziness for continuous-valued attributes in decision tree generation , 1998, Fuzzy Sets Syst..