Meta learning of bounds on the Bayes classifier error

Meta learning uses information from base learners (e.g. classifiers or estimators) as well as information about the learning problem to improve upon the performance of a single base learner. For example, the Bayes error rate of a given feature space, if known, can be used to aid in choosing a classifier, as well as in feature selection and model selection for the base classifiers and the meta classifier. Recent work in the field of f-divergence functional estimation has led to the development of simple and rapidly converging estimators that can be used to estimate various bounds on the Bayes error. We estimate multiple bounds on the Bayes error using an estimator that applies meta learning to slowly converging plug-in estimators to obtain the parametric convergence rate. We compare the estimated bounds empirically on simulated data and then estimate the tighter bounds on features extracted from an image patch analysis of sunspot continuum and magnetogram images.

[1]  Gustavo Carneiro,et al.  Minimum Bayes error features for visual recognition by sequential feature selection and extraction , 2005, The 2nd Canadian Conference on Computer and Robot Vision (CRV'05).

[2]  Peter Secretan Learning , 1965, Mental Health.

[3]  Kirthevasan Kandasamy,et al.  Nonparametric Estimation of Renyi Divergence and Friends , 2014, ICML.

[4]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[5]  Alfred O. Hero,et al.  Empirically Estimable Classification Bounds Based on a New Divergence Measure , 2014, ArXiv.

[6]  W AhaDavid Tolerating noisy, irrelevant and novel attributes in instance-based learning algorithms , 1992 .

[7]  Philip K. Chan,et al.  Meta-learning in distributed data mining systems: Issues and approaches , 2007 .

[8]  Constance S. Warwick,et al.  Sunspot Configurations and Proton Flares , 1966 .

[9]  Salvatore J. Stolfo,et al.  Toward parallel and distributed learning by meta-learning , 1993 .

[10]  Alfred O. Hero,et al.  Image patch analysis of sunspots and active regions. II. Clustering via dictionary learning , 2015, ArXiv.

[11]  Benoît Frénay,et al.  On the Potential Inadequacy of Mutual Information for Feature Selection , 2012, ESANN.

[12]  Yun Q. Shi,et al.  Feature Selection based on the Bhattacharyya Distance , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[13]  Hong-Wen Deng,et al.  Gene selection for classification of microarray data based on the Bayes error , 2007, BMC Bioinformatics.

[14]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[15]  Salvatore J. Stolfo,et al.  Experiments on multistrategy learning by meta-learning , 1993, CIKM '93.

[16]  Martin J. Wainwright,et al.  Estimating Divergence Functionals and the Likelihood Ratio by Convex Risk Minimization , 2008, IEEE Transactions on Information Theory.

[17]  Alfred O. Hero,et al.  Multivariate f-divergence Estimation With Confidence , 2014, NIPS.

[18]  Harold Zirin,et al.  The Dependence of Large Flare Occurrence on the Magnetic Structure of Sunspots , 2000 .

[19]  J. Friedman,et al.  Multivariate generalizations of the Wald--Wolfowitz and Smirnov two-sample tests , 1979 .

[20]  Alfred O. Hero,et al.  Ensemble Estimators for Multivariate Entropy Estimation , 2013, IEEE Transactions on Information Theory.

[21]  David W. Aha,et al.  Tolerating Noisy, Irrelevant and Novel Attributes in Instance-Based Learning Algorithms , 1992, Int. J. Man Mach. Stud..

[22]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Alfred O. Hero,et al.  Ensemble estimation of multivariate f-divergence , 2014, 2014 IEEE International Symposium on Information Theory.

[24]  Barnabás Póczos,et al.  Exponential Concentration of a Density Functional Estimator , 2014, NIPS.

[25]  Hadar I. Avi-Itzhak,et al.  Arbitrarily Tight Upper and Lower Bounds on the Bayesian Probability of Error , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[26]  Pramod K. Varshney,et al.  A Tight Upper Bound on the Bayesian Probability of Error , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[27]  IEEE Signal Processing and Signal Processing Education Workshop, SP/SPE 2015, Salt Lake City, UT, USA, August 9-12, 2015 , 2015, SP/SPE.

[28]  Alfred O. Hero,et al.  Empirical Non-Parametric Estimation of the Fisher Information , 2014, IEEE Signal Processing Letters.

[29]  D. Kibler,et al.  Instance-based learning algorithms , 2004, Machine Learning.

[30]  Hiroshi Motoda,et al.  Computational Methods of Feature Selection , 2022 .

[31]  H. Chernoff A Measure of Asymptotic Efficiency for Tests of a Hypothesis Based on the sum of Observations , 1952 .

[32]  Guorong Xuan,et al.  Bhattacharyya distance feature selection , 1996, ICPR.

[33]  C. Quesenberry,et al.  A nonparametric estimate of a multivariate density function , 1965 .

[34]  Lorenzo Bruzzone,et al.  An extension of the Jeffreys-Matusita distance to multiclass cases for feature selection , 1995, IEEE Trans. Geosci. Remote. Sens..

[35]  Barnabás Póczos,et al.  Generalized Exponential Concentration Inequality for Renyi Divergence Estimation , 2014, ICML.

[36]  Alfred O. Hero,et al.  Image patch analysis and clustering of sunspots: A dimensionality reduction approach , 2014, 2014 IEEE International Conference on Image Processing (ICIP).