A direct measure of discriminant and characteristic capability for classifier building and assessment

Performance measures are used in various stages of the process aimed at solving a classification problem. Unfortunately, most of these measures are in fact biased, meaning that they strictly depend on the class ratio - i.e. on the imbalance between negative and positive samples. After pointing to the source of bias for the best known measures, novel unbiased measures are defined which are able to capture the concepts of discriminant and characteristic capability. The combined use of these measures can give important information to researchers involved in machine learning or pattern recognition tasks, in particular for classifier performance assessment and feature selection.

[1]  Moshe Ben-Bassat,et al.  35 Use of distance measures, information measures and error bounds in feature evaluation , 1982, Classification, Pattern Recognition and Reduction of Dimensionality.

[2]  Tomasz Winiarski,et al.  Feature selection based on information theory, consistency and separability indices , 2002, Proceedings of the 9th International Conference on Neural Information Processing, 2002. ICONIP '02..

[3]  Jerome H. Friedman,et al.  On Bias, Variance, 0/1—Loss, and the Curse-of-Dimensionality , 2004, Data Mining and Knowledge Discovery.

[4]  Peter A. Flach,et al.  A Unified View of Performance Metrics: Translating Threshold Choice into Expected Classification Loss C` Esar Ferri , 2012 .

[5]  Andrew W. Moore,et al.  Efficient Algorithms for Minimizing Cross Validation Error , 1994, ICML.

[6]  Peter A. Flach The Geometry of ROC Space: Understanding Machine Learning Metrics through ROC Isometrics , 2003, ICML.

[7]  Tong Zhang Statistical behavior and consistency of classification methods based on convex risk minimization , 2003 .

[8]  Huan Liu,et al.  Searching for Interacting Features , 2007, IJCAI.

[9]  Peter A. Flach,et al.  A Coherent Interpretation of AUC as a Measure of Aggregated Classification Performance , 2011, ICML.

[10]  John Shawe-Taylor,et al.  Canonical Correlation Analysis: An Overview with Application to Learning Methods , 2004, Neural Computation.

[11]  George Kingsley Zipf,et al.  Human behavior and the principle of least effort , 1949 .

[12]  Christian Jutten,et al.  Blind separation of sources, part I: An adaptive algorithm based on neuromimetic architecture , 1991, Signal Process..

[13]  Robert C. Holte,et al.  Explicitly representing expected cost: an alternative to ROC representation , 2000, KDD '00.

[14]  José Manuel Benítez,et al.  Consistency measures for feature selection , 2008, Journal of Intelligent Information Systems.

[15]  Gavin Brown,et al.  Conditional Likelihood Maximisation: A Unifying Framework for Information Theoretic Feature Selection , 2012, J. Mach. Learn. Res..

[16]  L. N. Kanal,et al.  Handbook of Statistics, Vol. 2. Classification, Pattern Recognition and Reduction of Dimensionality. , 1985 .

[17]  G. Brier VERIFICATION OF FORECASTS EXPRESSED IN TERMS OF PROBABILITY , 1950 .

[18]  Shihong Lao,et al.  Discriminant analysis in correlation similarity measure space , 2007, ICML '07.

[19]  Huan Liu,et al.  Feature Selection and Classification - A Probabilistic Wrapper Approach , 1996, IEA/AIE.

[20]  Pat Langley,et al.  Selection of Relevant Features and Examples in Machine Learning , 1997, Artif. Intell..

[21]  K. Pearson VII. Note on regression and inheritance in the case of two parents , 1895, Proceedings of the Royal Society of London.

[22]  Ali Mansour,et al.  Blind Separation of Sources , 1999 .

[23]  Roberto Battiti,et al.  Using mutual information for selecting features in supervised neural net learning , 1994, IEEE Trans. Neural Networks.

[24]  Zhihua Qiao,et al.  Efiective Linear Discriminant Analysis for High Dimensional, Low Sample Size Data , 2007 .

[25]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[26]  L. Ryd,et al.  On bias. , 1994, Acta orthopaedica Scandinavica.

[27]  Huan Liu,et al.  Consistency Based Feature Selection , 2000, PAKDD.

[28]  Daphne Koller,et al.  Toward Optimal Feature Selection , 1996, ICML.

[29]  Gustavo E. A. P. A. Batista,et al.  A Survey on Graphical Methods for Classification Predictive Performance Evaluation , 2011, IEEE Transactions on Knowledge and Data Engineering.

[30]  Vijay V. Raghavan,et al.  A critical investigation of recall and precision as measures of retrieval system performance , 1989, TOIS.

[31]  Yiming Yang,et al.  RCV1: A New Benchmark Collection for Text Categorization Research , 2004, J. Mach. Learn. Res..

[32]  Andrew P. Bradley,et al.  The use of the area under the ROC curve in the evaluation of machine learning algorithms , 1997, Pattern Recognit..

[33]  Masashi Sugiyama,et al.  Local Fisher discriminant analysis for supervised dimensionality reduction , 2006, ICML.

[34]  Michael Mitzenmacher,et al.  Detecting Novel Associations in Large Data Sets , 2011, Science.

[35]  Pedro M. Domingos A Unified Bias-Variance Decomposition for Zero-One and Squared Loss , 2000, AAAI/IAAI.

[36]  Huan Liu,et al.  Consistency-based search in feature selection , 2003, Artif. Intell..

[37]  Ron Kohavi,et al.  Irrelevant Features and the Subset Selection Problem , 1994, ICML.

[38]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[39]  David J. Hand,et al.  Construction and Assessment of Classification Rules , 1997 .

[40]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[41]  Maria L. Rizzo,et al.  Measuring and testing dependence by correlation of distances , 2007, 0803.4101.