Empirically-estimable multi-class classification bounds

In this paper, we extend previously developed non-parametric bounds on the Bayes risk in binary classification problems to multi-class problems. In comparison with the well-known Bhattacharyya bound which is typically calculated by employing parametric assumptions, the bounds proposed in this paper are directly estimable from data, provably tighter, and more robust to different types of data. We verify the tightness and validity of this bound using an illustrative synthetic example, and further demonstrate its value by incorporating it into a feature selection algorithm which we apply to the real-world problem of distinguishing between different neuro-motor disorders based on sentence-level speech data.

[1]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[2]  Visar Berisha,et al.  Domain invariant speech features using a new divergence measure , 2014, 2014 IEEE Spoken Language Technology Workshop (SLT).

[3]  F.D. Garber,et al.  Bounds on the Bayes Classification Error Based on Pairwise Risk Functions , 1988, IEEE Trans. Pattern Anal. Mach. Intell..

[4]  Jennifer Urner Forensic Speaker Identification , 2016 .

[5]  J. Liss,et al.  Vowel acoustics in dysarthria: speech disorder diagnosis and classification. , 2014, Journal of speech, language, and hearing research : JSLHR.

[6]  J. D. Gorman,et al.  Alpha-Divergence for Classification, Indexing and Retrieval (Revised 2) , 2002 .

[7]  Luis Rueda,et al.  A New Approach to Multi-class Linear Dimensionality Reduction , 2006, CIARP.

[8]  Alfred O. Hero,et al.  Empirical Non-Parametric Estimation of the Fisher Information , 2014, IEEE Signal Processing Letters.

[9]  Thomas G. Dietterich,et al.  Solving Multiclass Learning Problems via Error-Correcting Output Codes , 1994, J. Artif. Intell. Res..

[10]  R. Raich,et al.  Multiclass linear dimension reduction via a generalized Chernoff bound , 2008, 2008 IEEE Workshop on Machine Learning for Signal Processing.

[11]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[12]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[13]  Alfred O. Hero,et al.  Empirically Estimable Classification Bounds Based on a New Divergence Measure , 2014, ArXiv.

[14]  T. Kailath The Divergence and Bhattacharyya Distance Measures in Signal Selection , 1967 .

[15]  Koby Crammer,et al.  On the Algorithmic Implementation of Multiclass Kernel-based Vector Machines , 2002, J. Mach. Learn. Res..

[16]  Kristin P. Bennett,et al.  Multicategory Classification by Support Vector Machines , 1999, Comput. Optim. Appl..

[17]  J. Friedman,et al.  Multivariate generalizations of the Wald--Wolfowitz and Smirnov two-sample tests , 1979 .

[18]  Thomas F. Quatieri,et al.  Vocal biomarkers of depression based on motor incoordination , 2013, AVEC@ACM Multimedia.

[19]  King-Sun Fu,et al.  Error estimation in pattern recognition via LAlpha -distance between posterior density functions , 1976, IEEE Trans. Inf. Theory.

[20]  J. Liss,et al.  Discriminating dysarthria type from envelope modulation spectra. , 2010, Journal of speech, language, and hearing research : JSLHR.