Un estudio empírico preliminar sobre los tests estadísticos más habituales en el aprendizaje automático

Resumen Actualmente no existe un diseño experimental que sea admitido de forma universal por los investigadores en aprendizaje automático. Hay opiniones diversas en lo referente a la proporción de ejemplos de la muestra que se debe reservar para la fase de validación, o acerca de la forma en que se deben seleccionar estos ejemplos, por mencionar algunos puntos controvertidos. En este trabajo se revisa la bibliograf́ıa más relevante al respecto, y se discuten las conclusiones preliminares obtenidas mediante un análisis emṕırico de la potencia de varios tests, usados comúnmente por los investigadores en mineŕıa de datos. El estudio experimental se instrumenta sobre varios conjuntos de datos sintéticos, con propiedades teóricas conocidas.

[1]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[2]  Wei-Yin Loh,et al.  A Comparison of Prediction Accuracy, Complexity, and Training Time of Thirty-Three Old and New Classification Algorithms , 2000, Machine Learning.

[3]  H. B. Mann,et al.  On a Test of Whether one of Two Random Variables is Stochastically Larger than the Other , 1947 .

[4]  Simon Haykin,et al.  Neural Networks: A Comprehensive Foundation , 1998 .

[5]  Robert C. Holte,et al.  Very Simple Classification Rules Perform Well on Most Commonly Used Datasets , 1993, Machine Learning.

[6]  Roger Sauter,et al.  Introduction to Probability and Statistics for Engineers and Scientists , 2005, Technometrics.

[7]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[8]  David H. Wolpert,et al.  On the Connection between In-sample Testing and Generalization Error , 1992, Complex Syst..

[9]  S. Shapiro,et al.  An Analysis of Variance Test for Normality (Complete Samples) , 1965 .

[10]  I. Chakravarti,et al.  Handbook of Methods of Applied Statistics:@@@Volume I: Techniques of Computation, Descriptive Methods, and Statistical Inference@@@Volume II: Planning of Surveys and Experiments. , 1968 .

[11]  Paul R. Cohen,et al.  Empirical methods for artificial intelligence , 1995, IEEE Expert.

[12]  J D Knoke,et al.  Estimation of error rates in discriminant analysis with selection of variables. , 1989, Biometrics.

[13]  Jean-Paul Watson,et al.  Testing, Evaluation and Performance of Optimization and Learning Systems , 2002 .

[14]  N. A. Diamantidis,et al.  Unsupervised stratification of cross-validation for accuracy estimation , 2000, Artif. Intell..

[15]  Luis Ruiz Maya Pérez Métodos estadísticos de investigación: (introducción al análisis de la varianza) , 1977 .

[16]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[17]  M. R. Mickey,et al.  Estimation of Error Rates in Discriminant Analysis , 1968 .

[18]  Carla E. Brodley,et al.  The Effect of Instance-Space Partition on Significance , 2001, Machine Learning.

[19]  Thomas G. Dietterich Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms , 1998, Neural Computation.

[20]  Mauricio G. C. Resende,et al.  Designing and reporting on computational experiments with heuristic methods , 1995, J. Heuristics.

[21]  Cullen Schaffer,et al.  A Conservation Law for Generalization Performance , 1994, ICML.

[22]  Seymour Geisser,et al.  The Predictive Sample Reuse Method with Applications , 1975 .

[23]  W. J. Langford Statistical Methods , 1959, Nature.

[24]  M. Stone Cross-validation:a review 2 , 1978 .

[25]  A. Baron Experimental Designs , 1990, The Behavior analyst.

[26]  M. Stone Cross‐Validatory Choice and Assessment of Statistical Predictions , 1976 .

[27]  Xiaoqin Zhang,et al.  A Randomized ANOVA Procedure for Comparing Performance Curves , 1997, ICML.

[28]  Ethem Alpayddn,et al.  Combined 5x2cv F Test for Comparing Supervised Classification Learning Algorithms Combined 5x2cv F Test for Comparing Supervised Classification Learning Algorithms , 1998 .

[29]  Robert C. Wolpert,et al.  A Review of the , 1985 .

[30]  Rahul Sukthankar,et al.  Complete Cross-Validation for Nearest Neighbor Classifiers , 2000, ICML.

[31]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[32]  Steven Salzberg,et al.  On Comparing Classifiers: Pitfalls to Avoid and a Recommended Approach , 1997, Data Mining and Knowledge Discovery.

[33]  F. Wilcoxon Individual Comparisons by Ranking Methods , 1945 .

[34]  M. Kendall Theoretical Statistics , 1956, Nature.

[35]  Dennis Lendrem Analysis of Variance in Experimental Design , 1994 .