Improving Bayesian credibility intervals for classifier error rates using maximum entropy empirical priors

OBJECTIVE Successful use of classifiers that learn to make decisions from a set of patient examples require robust methods for performance estimation. Recently many promising approaches for determination of an upper bound for the error rate of a single classifier have been reported but the Bayesian credibility interval (CI) obtained from a conventional holdout test still delivers one of the tightest bounds. The conventional Bayesian CI becomes unacceptably large in real world applications where the test set sizes are less than a few hundred. The source of this problem is that fact that the CI is determined exclusively by the result on the test examples. In other words, there is no information at all provided by the uniform prior density distribution employed which reflects complete lack of prior knowledge about the unknown error rate. Therefore, the aim of the study reported here was to study a maximum entropy (ME) based approach to improved prior knowledge and Bayesian CIs, demonstrating its relevance for biomedical research and clinical practice. METHOD AND MATERIAL It is demonstrated how a refined non-uniform prior density distribution can be obtained by means of the ME principle using empirical results from a few designs and tests using non-overlapping sets of examples. RESULTS Experimental results show that ME based priors improve the CIs when employed to four quite different simulated and two real world data sets. CONCLUSIONS An empirically derived ME prior seems promising for improving the Bayesian CI for the unknown error rate of a designed classifier.

[1]  Thomas Villmann,et al.  Cancer informatics by prototype networks in mass spectrometry , 2009, Artif. Intell. Medicine.

[2]  Philip Rabinowitz,et al.  Numerical methods for nonlinear algebraic equations , 1970 .

[3]  László Györfi,et al.  A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[4]  John Langford,et al.  A comparison of tight generalization error bounds , 2005, ICML '05.

[5]  M. Radmacher,et al.  Pitfalls in the use of DNA microarray data for diagnostic and prognostic classification. , 2003, Journal of the National Cancer Institute.

[6]  Nicolette de Keizer,et al.  Integrating classification trees with local logistic regression in Intensive Care prognosis , 2003, Artif. Intell. Medicine.

[7]  David J. Hand,et al.  Ten More Years of Error Rate Research , 2000 .

[8]  B. Efron Bootstrap Methods: Another Look at the Jackknife , 1979 .

[9]  Edward R. Dougherty,et al.  Is cross-validation valid for small-sample microarray classification? , 2004, Bioinform..

[10]  John Langford,et al.  Beating the hold-out: bounds for K-fold and progressive cross-validation , 1999, COLT '99.

[11]  E. Dougherty,et al.  Confidence Intervals for the True Classification Error Conditioned on the Estimated Error , 2006, Technology in cancer research & treatment.

[12]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[13]  Michael Green,et al.  Comparison between neural networks and multiple logistic regression to predict acute coronary syndrome in the emergency room , 2006, Artif. Intell. Medicine.

[14]  E. Jaynes,et al.  Confidence Intervals vs Bayesian Intervals , 1976 .

[15]  Ulisses Braga-Neto,et al.  Bolstered error estimation , 2004, Pattern Recognit..

[16]  Javier Bajo,et al.  Model of experts for decision support in the diagnosis of leukemia patients , 2009, Artif. Intell. Medicine.

[17]  Dursun Delen,et al.  Predicting breast cancer survivability: a comparison of three data mining methods , 2005, Artif. Intell. Medicine.

[18]  Evert de Jonge,et al.  Temporal abstraction for feature extraction: A comparative case study in prediction from intensive care monitoring data , 2007, Artif. Intell. Medicine.

[19]  Annette M. Molinaro,et al.  Prediction error estimation: a comparison of resampling methods , 2005, Bioinform..

[20]  S. T. Buckland,et al.  An Introduction to the Bootstrap. , 1994 .

[21]  E. Jaynes Information Theory and Statistical Mechanics , 1957 .

[22]  Stefan Michiels,et al.  Prediction of cancer outcome with microarrays: a multiple random validation strategy , 2005, The Lancet.

[23]  B. Efron Estimating the Error Rate of a Prediction Rule: Improvement on Cross-Validation , 1983 .

[24]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[25]  Bart Wyns,et al.  Prediction of diagnosis in patients with early arthritis using a combined Kohonen mapping and instance-based evaluation criterion , 2004, Artif. Intell. Medicine.

[26]  M. Stone Cross‐Validatory Choice and Assessment of Statistical Predictions , 1976 .

[27]  Nor Ashidi Mat Isa,et al.  An automated cervical pre-cancerous diagnostic system , 2008, Artif. Intell. Medicine.

[28]  Yoshua Bengio,et al.  Inference for the Generalization Error , 1999, Machine Learning.

[29]  Norman D. Black,et al.  Feature selection and classification model construction on type 2 diabetic patients' data , 2007, Artif. Intell. Medicine.

[30]  M. R. Mickey,et al.  Estimation of Error Rates in Discriminant Analysis , 1968 .

[31]  R. Tibshirani,et al.  Improvements on Cross-Validation: The 632+ Bootstrap Method , 1997 .

[32]  Sayan Mukherjee,et al.  Estimating Dataset Size Requirements for Classifying DNA Microarray Data , 2003, J. Comput. Biol..

[33]  D. J. Hand,et al.  Recent advances in error rate estimation , 1986, Pattern Recognit. Lett..

[34]  Hanna Göransson,et al.  Improved variance estimation of classification performance via reduction of bias caused by small sample size , 2006, BMC Bioinformatics.

[35]  Chris D. Nugent,et al.  Evaluation of inherent performance of intelligent medical decision support systems: utilising neural networks as an example , 2003, Artif. Intell. Medicine.

[36]  Zne-Jung Lee,et al.  An integrated algorithm for gene selection and classification applied to microarray data of ovarian cancer , 2008, Artif. Intell. Medicine.

[37]  David B. Dunson,et al.  Bayesian Data Analysis , 2010 .

[38]  Andrew R. Webb,et al.  Statistical Pattern Recognition , 1999 .

[39]  C. Hooker,et al.  Foundations of Probability Theory, Statistical Inference, and Statistical Theories of Science , 1976 .

[40]  Sung-Bae Cho,et al.  The classification of cancer based on DNA microarray data that uses diverse ensemble genetic programming , 2006, Artif. Intell. Medicine.

[41]  A. Brix Bayesian Data Analysis, 2nd edn , 2005 .

[42]  D. Soeria-Atmadja,et al.  Computational detection of allergenic proteins attains a new level of accuracy with in silico variable-length peptide extraction and machine learning , 2006, Nucleic acids research.

[43]  Giorgio Valentini,et al.  Computational intelligence and machine learning in bioinformatics , 2009, Artif. Intell. Medicine.

[44]  Yoshua Bengio,et al.  No Unbiased Estimator of the Variance of K-Fold Cross-Validation , 2003, J. Mach. Learn. Res..

[45]  Vivian West,et al.  Model selection for a medical diagnostic decision support system: a breast cancer detection case , 2000, Artif. Intell. Medicine.

[46]  J. Langford Tutorial on Practical Prediction Theory for Classification , 2005, J. Mach. Learn. Res..

[47]  A. Isaksson,et al.  Cross-validation and bootstrapping are unreliable in small sample classification , 2008, Pattern Recognit. Lett..