Discriminative vs Informative Learning

The goal of pattern classification can be approached from two points of view: informative - where the classifier learns the class densities, or discriminative - where the focus is on learning the class boundaries without regard to the underlying class densities. We review and synthesize the tradeoffs between these two approaches for simple classifiers, and extend the results to modern techniques such as Naive Bayes and Generalized Additive Models. Data mining applications often operate in the domain of high dimensional features where the tradeoffs between informative and discriminative classifiers are especially relevant. Experimental results are provided for simulated and real data.

[1]  M. Kendall Theoretical Statistics , 1956, Nature.

[2]  M. R. Mickey,et al.  Estimation of Error Rates in Discriminant Analysis , 1968 .

[3]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[4]  B. Efron The Efficiency of Logistic Regression Compared to Normal Discriminant Analysis , 1975 .

[5]  G. McLachlan,et al.  LOGISTIC REGRESSION COMPARED TO NORMAL DISCRIMINATION FOR NON-NORMAL POPULATIONS‘ , 1980 .

[6]  Terence J. O'Neill The General Distribution of the Error Rate of a Classification Procedure With Application to Logistic Regression Discrimination , 1980 .

[7]  Sarunas Raudys,et al.  On Dimensionality, Sample Size, Classification Error, and Complexity of Classification Algorithm in Pattern Recognition , 1980, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Philip E. Gill,et al.  Practical optimization , 1981 .

[9]  Erkki Oja,et al.  Subspace methods of pattern recognition , 1983 .

[10]  Gene H. Golub,et al.  Matrix computations , 1983 .

[11]  R. Redner,et al.  Mixture densities, maximum likelihood, and the EM algorithm , 1984 .

[12]  A. Dale Magoun,et al.  Decision, estimation and classification , 1989 .

[13]  Allen Gersho,et al.  Vector quantization and signal compression , 1991, The Kluwer international series in engineering and computer science.

[14]  Terje O. Espelid,et al.  Algorithm 698: DCUHRE: an adaptive multidemensional integration routine for a vector of integrals , 1991, TOMS.

[15]  S. Ruiz-Velasco Asymptotic efficiency of logistic regression relative to linear discriminant analysis , 1991 .

[16]  G. McLachlan Discriminant Analysis and Statistical Pattern Recognition , 1992 .

[17]  Michael D. Geurts,et al.  When and how much do average forecasts improve predictive accuracy , 1992 .

[18]  C. J. Stone,et al.  Logspline Density Estimation for Censored Data , 1992 .

[19]  Patrice Y. Simard,et al.  Learning Prototype Models for Tangent Distance , 1994, NIPS.

[20]  B. Silverman,et al.  Nonparametric regression and generalized linear models , 1994 .

[21]  Pat Langley,et al.  Induction of Selective Bayesian Classifiers , 1994, UAI.

[22]  David J. Spiegelhalter,et al.  Machine Learning, Neural and Statistical Classification , 2009 .

[23]  Hermann Ney,et al.  On the Probabilistic Interpretation of Neural Network Classifiers and Discriminative Training Criteria , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[24]  Pat Langley,et al.  Estimating Continuous Distributions in Bayesian Classifiers , 1995, UAI.

[25]  C. J. Stone,et al.  Polychotomous Regression , 1995 .

[26]  Holger Schwenk,et al.  Learning Discriminant Tangent Models for Handwritten Character Recognition , 1995 .

[27]  Padhraic Smyth,et al.  From Data Mining to Knowledge Discovery: An Overview , 1996, Advances in Knowledge Discovery and Data Mining.

[28]  Alessandro Sperduti,et al.  A Constructive Learning Algorithm for Discriminant Tangent Models , 1996, NIPS.

[29]  David Heckerman,et al.  Asymptotic Model Selection for Directed Networks with Hidden Variables , 1996, UAI.

[30]  Robert M. Gray,et al.  Bayes risk weighted vector quantization with posterior estimation for image compression and classification , 1996, IEEE Trans. Image Process..

[31]  Young K. Truong,et al.  Polynomial splines and their tensor products in extended linear modeling: 1994 Wald memorial lecture , 1997 .

[32]  Ayoub Ghriss,et al.  Mixtures of Probabilistic Principal Component Analysers , 2018 .