Credible classification for environmental problems

Classifiers that aim at doing credible predictions should rely on carefully elicited prior knowledge. Often this is not available so they should start learning from data in condition of near-ignorance. This paper shows empirically, on an agricultural data set, that established methods of classification do not always adhere to this principle. Traditional ways to represent prior ignorance are shown to have an overwhelming weight compared to the information in the data, producing overconfident predictions. This point is crucial for problems, such as environmental ones, where prior knowledge is often scarce and even the data may not be known precisely. Credal classification, and in particular the naive credal classifier, is proposed as more faithful ways to cope with the ignorance problem. With credal classification, conditions of ignorance may limit the power of the inferences, not the credibility of the predictions.

[1]  Enrico Fagiuoli,et al.  Tree-Based Credal Networks for Classification , 2003, Reliab. Comput..

[2]  J. Huisman The Netherlands , 1996, The Lancet.

[3]  Isaac Levi,et al.  The Enterprise Of Knowledge , 1980 .

[4]  Philippe Nivlet,et al.  Interval Discriminant Analysis: An Efficient Method to Integrate Errors In Supervised Pattern Recognition , 2001, ISIPTA.

[5]  P. Walley Inferences from Multinomial Data: Learning About a Bag of Marbles , 1996 .

[6]  William H. Press,et al.  The Art of Scientific Computing Second Edition , 1998 .

[7]  Marco Zaffalon,et al.  Reliable diagnoses of dementia by the naive credal classifier inferred from incomplete cognitive data , 2003, Artif. Intell. Medicine.

[8]  Marco Zaffalon,et al.  Statistical inference of the naive credal classifier , 2001, ISIPTA.

[9]  Charles F. Manski,et al.  Censoring of Outcomes and Regressors Due to Survey Nonresponse: Identification and estimation Using Weights and Imputations , 1998 .

[10]  P. Laplace Théorie analytique des probabilités , 1995 .

[11]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[12]  P. Walley Statistical Reasoning with Imprecise Probabilities , 1990 .

[13]  Serafín Moral,et al.  Maximum of Entropy in Credal Classification , 2003, ISIPTA.

[14]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[15]  Wilfred Perks,et al.  Some observations on inverse probability including a new indifference rule , 1947 .

[16]  David G. Stork,et al.  Pattern Classification , 1973 .

[17]  C. Manski Partial Identification of Probability Distributions , 2003 .

[18]  Joel L. Horowitz,et al.  Imprecise identification from incomplete data , 2001, ISIPTA.

[19]  David J. Spiegelhalter,et al.  Sequential Model Criticism in Probabilistic Expert Systems , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[20]  Paola Sebastiani,et al.  c ○ 2001 Kluwer Academic Publishers. Manufactured in The Netherlands. Robust Learning with Missing Data , 2022 .

[21]  Charles F. Manski,et al.  3 The selection problem in econometrics and statistics , 1993 .

[22]  Peter Reichert On the necessity of using imprecise probabilities for modelling environmental systems , 1997 .

[23]  Serafín Moral,et al.  Building classification trees using the total uncertainty criterion , 2003, Int. J. Intell. Syst..

[24]  F. A. Seiler,et al.  Numerical Recipes in C: The Art of Scientific Computing , 1989 .

[25]  Paola Sebastiani,et al.  Robust Bayes classifiers , 2001, Artif. Intell..

[26]  L. M. M.-T. Theory of Probability , 1929, Nature.

[27]  Nicole A. Lazar,et al.  Statistical Analysis With Missing Data , 2003, Technometrics.

[28]  Elmar Kriegler,et al.  Climate Projections for the 21st Century Using Random Sets , 2003, ISIPTA.

[29]  Marco Zaffalon The naive credal classifier , 2002 .

[30]  Ron Kohavi,et al.  MLC++: a machine learning library in C++ , 1994, Proceedings Sixth International Conference on Tools with Artificial Intelligence. TAI 94.

[31]  Marco Zaffalon A Credal Approach to Naive Classification , 1999, ISIPTA.

[32]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[33]  Marco Zaffalon Exact credal treatment of missing data , 2002 .