Improvement in Classification Algorithms through Model Stacking with the Consideration of their Correlation

In this research we analyzed the performance of some well-known classification algorithms in terms of their accuracy and proposed a methodology for model stacking on the basis of their correlation which improves the accuracy of these algorithms. We selected; Support Vector Machines (svm), Naive Bayes (nb), k-Nearest Neighbors (knn), Generalized Linear Model (glm), Latent Discriminant Analysis (lda), gbm, Recursive Partitioning and Regression Trees (rpart), rda, Neural Networks (nnet) and Conditional Inference Trees (ctree) in our research and preformed analyses on three textual datasets of different sizes; Scopus 50,000 instances, IMDB Movie Reviews having 10,000 instances, Amazon Products Reviews having 1000 instances and Yelp dataset having 1000 instances. We used R-Studio for performing experiments. Results show that the performance of all algorithms increased at Meta level. Neural Networks achieved the best results with more than 25% improvement at Meta-Level and outperformed the other evaluated methods with an accuracy of 95.66%, and altogether our model gives far better results than individual algorithms’ performance.

[1]  Behzad Moshiri,et al.  Improve text classification accuracy based on classifier fusion methods , 2007, 2007 10th International Conference on Information Fusion.

[2]  Georgios Paliouras,et al.  Combining Information Extraction Systems Using Voting and Stacked Generalization , 2005, J. Mach. Learn. Res..

[3]  Riyaz Sikora,et al.  A Modified Stacking Ensemble Machine Learning Algorithm Using Genetic Algorithms , 2014, Journal of International Technology and Information Management.

[4]  Randall S. Sexton,et al.  Improving Decision Effectiveness of Artificial Neural Networks: A Modified Genetic Algorithm Approach , 2003, Decis. Sci..

[5]  Bernard Zenko,et al.  Is Combining Classifiers with Stacking Better than Selecting the Best One? , 2004, Machine Learning.

[6]  David H. Wolpert,et al.  Stacked generalization , 1992, Neural Networks.

[7]  Stephen Shaoyi Liao,et al.  Mining comparative opinions from customer reviews for Competitive Intelligence , 2011, Decis. Support Syst..

[8]  D. Opitz,et al.  Popular Ensemble Methods: An Empirical Study , 1999, J. Artif. Intell. Res..

[9]  Riyaz Sikora,et al.  Efficient Genetic Algorithm Based Data Mining Using Feature Selection with Hausdorff Distance , 2005, Inf. Technol. Manag..

[10]  Alexandra Balahur,et al.  Sentiment Analysis in Social Media Texts , 2013, WASSA@NAACL-HLT.

[11]  Sotiris B. Kotsiantis,et al.  Machine learning: a review of classification and combining techniques , 2006, Artificial Intelligence Review.

[12]  M. Nissim,et al.  Proceedings of the 4th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis , 2013 .

[13]  Ian H. Witten,et al.  Issues in Stacked Generalization , 2011, J. Artif. Intell. Res..

[14]  Jin Wang,et al.  Short Text Classification Algorithm Based on Semi-Supervised Learning and SVM , 2015, MUE 2015.

[15]  Hedieh Sajedi,et al.  A hybrid data mining model of feature selection algorithms and ensemble learning classifiers for credit scoring , 2015 .

[16]  Javier Del Ser,et al.  A new grouping genetic algorithm for clustering problems , 2012, Expert Syst. Appl..

[17]  Hiral D. Padhiyar,et al.  Improving Accuracy of Text Classification for SMS Data , 2017 .

[18]  David D. Jensen,et al.  Why Stacked Models Perform Effective Collective Classification , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[19]  Marti A. Hearst Untangling Text Data Mining , 1999, ACL.

[20]  Christopher J. Merz,et al.  Using Correspondence Analysis to Combine Classifiers , 1999, Machine Learning.

[21]  Mahdi Eftekhari,et al.  A new ensemble learning methodology based on hybridization of classifier ensemble selection approaches , 2015, Appl. Soft Comput..