Ensemble of classifiers to improve accuracy of the CLIP4 machine-learning algorithm

Machine learning, one of the data mining and knowledge discovery tools, addresses automated extraction of knowledge from data, expressed in the form of production rules. The paper describes a method for improving accuracy of rules generated by inductive machine learning algorithm by generating the ensemble of classifiers. It generates multiple classifiers using the CLIP4 algorithm and combines them using a voting scheme. The generation of a set of different classifiers is performed by injecting controlled randomness into the learning algorithm, but without modifying the training data set. Our method is based on the characteristic properties of the CLIP4 algorithm. The case study of the SPECT heart image analysis system is used as an example where improving accuracy is very important. Benchmarking results on other well-known machine learning datasets, and comparison with an algorithm that uses boosting technique to improve its accuracy are also presented. The proposed method always improves the accuracy of the results when compared with the accuracy of a single classifier generated by the CLIP4 algorithm, as opposed to using boosting. The obtained results are comparable with other state-of-the-art machine learning algorithms.

[1]  Paul W. Munro,et al.  Improving Committee Diagnosis with Resampling Techniques , 1995, NIPS.

[2]  G. Nemhauser,et al.  Integer Programming , 2020 .

[3]  Robert E. Schapire,et al.  A Brief Introduction to Boosting , 1999, IJCAI.

[4]  Krzysztof J. Cios,et al.  Hybrid inductive machine learning: an overview of CLIP algorithms , 2002 .

[5]  Witold Pedrycz,et al.  Data Mining Methods for Knowledge Discovery , 1998, IEEE Trans. Neural Networks.

[6]  Dorit S. Hochbaum,et al.  Approximation Algorithms for the Set Covering and Vertex Cover Problems , 1982, SIAM J. Comput..

[7]  Krzysztof J. Cios,et al.  An algorithm which learns multiple covers via integer linear programming. Part II: experimental results and conclusions , 1995 .

[8]  Ryszard S. Michalski,et al.  A Theory and Methodology of Inductive Learning , 1983, Artificial Intelligence.

[9]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[10]  M. Pazzani,et al.  Error Reduction through Learning Multiple Descriptions , 1996, Machine Learning.

[11]  Wei-Yin Loh,et al.  A Comparison of Prediction Accuracy, Complexity, and Training Time of Thirty-Three Old and New Classification Algorithms , 2000, Machine Learning.

[12]  John F. Kolen,et al.  Backpropagation is Sensitive to Initial Conditions , 1990, Complex Syst..

[13]  H. D. Ratliff,et al.  Set Covering and Involutory Bases , 1971 .

[14]  Thomas G. Dietterich Multiple Classifier Systems , 2000, Lecture Notes in Computer Science.

[15]  Krzysztof J. Cios,et al.  An algorithm which learns multiple covers via integer linear programming. Part I: the CLILP2 algorithm , 1995 .

[16]  Vasek Chvátal,et al.  A Greedy Heuristic for the Set-Covering Problem , 1979, Math. Oper. Res..

[17]  Krzysztof J. Cios,et al.  CLIP3: Cover learning using integer programming , 1997 .

[18]  Michael J. Pazzani,et al.  Error reduction through learning multiple descriptions , 2004, Machine Learning.

[19]  Lukasz A. Kurgan,et al.  Knowledge discovery approach to automated cardiac SPECT diagnosis , 2001, Artif. Intell. Medicine.

[20]  Ryszard S. Michalski,et al.  Discovering Classification Rules Using variable-Valued Logic System VL1 , 1973, IJCAI.

[21]  Thomas G. Dietterich Machine-Learning Research , 1997, AI Mag..

[22]  Chris Carter,et al.  Multiple decision trees , 2013, UAI.