Learning Reliable Classifiers From Small or Incomplete Data Sets: The Naive Credal Classifier 2

In this paper, the naive credal classifier, which is a set-valued counterpart of naive Bayes, is extended to a general and flexible treatment of incomplete data, yielding a new classifier called naive credal classifier 2 (NCC2). The new classifier delivers classifications that are reliable even in the presence of small sample sizes and missing values. Extensive empirical evaluations show that, by issuing set-valued classifications, NCC2 is able to isolate and properly deal with instances that are hard to classify (on which naive Bayes accuracy drops considerably), and to perform as well as naive Bayes on the other instances. The experiments point to a general problem: they show that with missing values, empirical evaluations may not reliably estimate the accuracy of a traditional classifier, such as naive Bayes. This phenomenon adds even more value to the robust approach to classification implemented by NCC2.

[1]  Marco Zaffalon,et al.  Statistical inference of the naive credal classifier , 2001, ISIPTA.

[2]  Marco Zaffalon,et al.  Credible classification for environmental problems , 2005, Environ. Model. Softw..

[3]  R. Strawderman Continuous Multivariate Distributions, Volume 1: Models and Applications , 2001 .

[4]  P. Walley Inferences from Multinomial Data: Learning About a Bag of Marbles , 1996 .

[5]  William H. Press,et al.  The Art of Scientific Computing Second Edition , 1998 .

[6]  Nicole A. Lazar,et al.  Statistical Analysis With Missing Data , 2003, Technometrics.

[7]  C. Manski Partial Identification of Probability Distributions , 2003 .

[8]  M. Jaeger,et al.  Ignorability in Statistical and Probabilistic Inference , 2005, J. Artif. Intell. Res..

[9]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[10]  D. Rubin INFERENCE AND MISSING DATA , 1975 .

[11]  Paola Sebastiani,et al.  Robust Bayes classifiers , 2001, Artif. Intell..

[12]  Ian H. Witten,et al.  Data mining - practical machine learning tools and techniques, Second Edition , 2005, The Morgan Kaufmann series in data management systems.

[13]  Ian Witten,et al.  Data Mining , 2000 .

[14]  Marco Zaffalon,et al.  Reliable diagnoses of dementia by the naive credal classifier inferred from incomplete cognitive data , 2003, Artif. Intell. Medicine.

[15]  D. Rubin,et al.  Ignorability and Coarse Data , 1991 .

[16]  Marco Zaffalon,et al.  Conservative Rules for Predictive Inference with Incomplete Data , 2005, ISIPTA.

[17]  P. Walley Statistical Reasoning with Imprecise Probabilities , 1990 .

[18]  Marco Zaffalon The naive credal classifier , 2002 .

[19]  Usama M. Fayyad,et al.  Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning , 1993, IJCAI.

[20]  F. A. Seiler,et al.  Numerical Recipes in C: The Art of Scientific Computing , 1989 .