Choosing the Most Effective Pattern Classification Model under Learning-Time Constraint

Nowadays, large datasets are common and demand faster and more effective pattern analysis techniques. However, methodologies to compare classifiers usually do not take into account the learning-time constraints required by applications. This work presents a methodology to compare classifiers with respect to their ability to learn from classification errors on a large learning set, within a given time limit. Faster techniques may acquire more training samples, but only when they are more effective will they achieve higher performance on unseen testing sets. We demonstrate this result using several techniques, multiple datasets, and typical learning-time limits required by applications.

[1]  M. Friedman The Use of Ranks to Avoid the Assumption of Normality Implicit in the Analysis of Variance , 1937 .

[2]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[3]  Ethem Alpaydin,et al.  Design and Analysis of Classifier Learning Experiments in Bioinformatics: Survey and Case Studies , 2012, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[4]  Tie-Yan Liu,et al.  Directly optimizing evaluation measures in learning to rank , 2008, SIGIR.

[5]  Charles Parker,et al.  An Analysis of Performance Measures for Binary Classifiers , 2011, 2011 IEEE 11th International Conference on Data Mining.

[6]  Vijay V. Raghavan,et al.  A critical investigation of recall and precision as measures of retrieval system performance , 1989, TOIS.

[7]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[8]  Haifeng Guo,et al.  Data Mining Techniques for Complex Formation Evaluation in Petroleum Exploration and Production: A Comparison of Feature Selection and Classification Methods , 2008, 2008 IEEE Pacific-Asia Workshop on Computational Intelligence and Industrial Application.

[9]  João Paulo Papa,et al.  Supervised pattern classification based on optimum‐path forest , 2009, Int. J. Imaging Syst. Technol..

[10]  Peter A. Flach,et al.  Learning Decision Trees Using the Area Under the ROC Curve , 2002, ICML.

[11]  David J. Hand,et al.  Measuring classifier performance: a coherent alternative to the area under the ROC curve , 2009, Machine Learning.

[12]  Rong Jin,et al.  Learning to Rank by Optimizing NDCG Measure , 2009, NIPS.

[13]  Gustavo E. A. P. A. Batista,et al.  A study of the behavior of several methods for balancing machine learning training data , 2004, SKDD.

[14]  Gemma C. Garriga,et al.  Permutation Tests for Studying Classifier Performance , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[15]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[16]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[17]  Cesar H. Comin,et al.  A Systematic Comparison of Supervised Classifiers , 2013, PloS one.

[18]  Jingjing Lu,et al.  Comparing naive Bayes, decision trees, and SVM with AUC and accuracy , 2003, Third IEEE International Conference on Data Mining.

[19]  Yu Hen Hu,et al.  Vehicle classification in distributed sensor networks , 2004, J. Parallel Distributed Comput..

[20]  Anil K. Jain,et al.  Statistical Pattern Recognition: A Review , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[21]  Filip Radlinski,et al.  A support vector method for optimizing average precision , 2007, SIGIR.

[22]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[23]  Nasir D. Memon,et al.  CoCoST: A Computational Cost Efficient Classifier , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[24]  Bhavani Raskutti,et al.  Optimising area under the ROC curve using gradient descent , 2004, ICML.

[25]  Jonathan M. Garibaldi,et al.  A Comparison of Three Different Methods for Classification of Breast Cancer Data , 2008, 2008 Seventh International Conference on Machine Learning and Applications.

[26]  Mehryar Mohri,et al.  AUC Optimization vs. Error Rate Minimization , 2003, NIPS.

[27]  Thorsten Joachims,et al.  A support vector method for multivariate performance measures , 2005, ICML.

[28]  Peter A. Flach,et al.  A Coherent Interpretation of AUC as a Measure of Aggregated Classification Performance , 2011, ICML.

[29]  Claudia Biermann,et al.  Mathematical Methods Of Statistics , 2016 .

[30]  João Paulo Papa,et al.  Efficient supervised optimum-path forest classification for large datasets , 2012, Pattern Recognit..

[31]  Vipin Kumar,et al.  Optimizing F-Measure with Support Vector Machines , 2003, FLAIRS Conference.

[32]  Andrew P. Bradley,et al.  The use of the area under the ROC curve in the evaluation of machine learning algorithms , 1997, Pattern Recognit..

[33]  C. J. van Rijsbergen,et al.  The use of hierarchic clustering in information retrieval , 1971, Inf. Storage Retr..

[34]  Tsang-Long Pao,et al.  Comparison of Classification Methods for Detecting Emotion from Mandarin Speech , 2008, IEICE Trans. Inf. Syst..

[35]  Yoshua Bengio,et al.  Inference for the Generalization Error , 1999, Machine Learning.

[36]  João Paulo Papa,et al.  Automated diagnosis of human intestinal parasites using optical microscopy images , 2013, 2013 IEEE 10th International Symposium on Biomedical Imaging.

[37]  David H. Mathews,et al.  Detection of non-coding RNAs on the basis of predicted secondary structure formation free energy change , 2006, BMC Bioinformatics.

[38]  R. Dennis Cook,et al.  Cross-Validation of Regression Models , 1984 .

[39]  Ivor W. Tsang,et al.  Efficient Optimization of Performance Measures by Classifier Adaptation , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[40]  Steven Salzberg,et al.  On Comparing Classifiers: Pitfalls to Avoid and a Recommended Approach , 1997, Data Mining and Knowledge Discovery.

[41]  Lawrence A. Ray,et al.  3-D examination of dental fractures with minimum user intervention , 2013, Medical Imaging.

[42]  Chih-Jen Lin,et al.  IJCNN 2001 challenge: generalization ability and text decoding , 2001, IJCNN'01. International Joint Conference on Neural Networks. Proceedings (Cat. No.01CH37222).

[43]  Blaise Hanczar,et al.  A New Measure of Classifier Performance for Gene Expression Data , 2012, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[44]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[45]  Mark Goadrich,et al.  The relationship between Precision-Recall and ROC curves , 2006, ICML.

[46]  Nitesh V. Chawla,et al.  Consequences of Variability in Classifier Performance Estimates , 2010, 2010 IEEE International Conference on Data Mining.

[47]  Alexander Dekhtyar,et al.  Information Retrieval , 2018, Lecture Notes in Computer Science.

[48]  Eibe Frank,et al.  Evaluating the Replicability of Significance Tests for Comparing Learning Algorithms , 2004, PAKDD.

[49]  Vladimir Vapnik,et al.  An overview of statistical learning theory , 1999, IEEE Trans. Neural Networks.

[50]  Bernard Zenko,et al.  Is Combining Classifiers with Stacking Better than Selecting the Best One? , 2004, Machine Learning.

[51]  Alexandre X. Falcão,et al.  Intelligent Understanding of User Interaction in Image Segmentation , 2012, Int. J. Pattern Recognit. Artif. Intell..

[52]  Ethem Alpaydın,et al.  Combined 5 x 2 cv F Test for Comparing Supervised Classification Learning Algorithms , 1999, Neural Comput..

[53]  Thomas G. Dietterich Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms , 1998, Neural Computation.