Simultaneous Predictive Gaussian Classifiers

Gaussian distribution has for several decades been ubiquitous in the theory and practice of statistical classification. Despite the early proposals motivating the use of predictive inference to design a classifier, this approach has gained relatively little attention apart from certain specific applications, such as speech recognition where its optimality has been widely acknowledged. Here we examine statistical properties of different inductive classification rules under a generic Gaussian model and demonstrate the optimality of considering simultaneous classification of multiple samples under an attractive loss function. It is shown that the simpler independent classification of samples leads asymptotically to the same optimal rule as the simultaneous classifier when the amount of training data increases, if the dimensionality of the feature space is bounded in an appropriate manner. Numerical investigations suggest that the simultaneous predictive classifier can lead to higher classification accuracy than the independent rule in the low-dimensional case, whereas the simultaneous approach suffers more from noise when the dimensionality increases.

[1]  Theofanis Sapatinas,et al.  Discriminant Analysis and Statistical Pattern Recognition , 2005 .

[2]  Jukka Corander,et al.  Inductive Inference and Partition Exchangeability in Classification , 2011, Algorithmic Probability and Friends.

[3]  T. Kollo,et al.  Advanced Multivariate Statistics with Matrices , 2005 .

[4]  Guido Consonni,et al.  Quaderni di Dipartimento Objective Bayes Factors for Gaussian Directed Acyclic Graphical Models , 2004 .

[5]  A. Zellner An Introduction to Bayesian Inference in Econometrics , 1971 .

[6]  Mats Gyllenberg,et al.  Bayesian model learning based on a parallel MCMC strategy , 2006, Stat. Comput..

[7]  Arnold Zellner,et al.  An Introduction to Bayesian Inference in Econometrics. , 1974 .

[8]  G'erard Letac,et al.  Wishart distributions for decomposable graphs , 2007, 0708.2380.

[9]  Geoffrey J. McLachlan,et al.  Discriminant Analysis and Statistical Pattern Recognition: McLachlan/Discriminant Analysis & Pattern Recog , 2005 .

[10]  Tatjana Pavlenko,et al.  Bayesian Block-Diagonal Predictive Classifier for Gaussian Data , 2012, SMPS.

[11]  Mats Gyllenberg,et al.  Bayesian unsupervised classification framework based on stochastic partitions of data and a parallel search strategy , 2009, Adv. Data Anal. Classif..

[12]  Jim Kay,et al.  A critical comparison of two methods of statistical discrimination , 1977 .

[13]  David Haussler,et al.  HOW WELL DO BAYES METHODS WORK FOR ON-LINE PREDICTION OF {+- 1} VALUES? , 1992 .

[14]  S. Geisser Posterior Odds for Multivariate Normal Classifications , 1964 .

[15]  William Nick Street,et al.  Breast Cancer Diagnosis and Prognosis Via Linear Programming , 1995, Oper. Res..

[16]  J. Aitchison,et al.  Statistical Prediction Analysis , 1975 .

[17]  Yoshua Bengio,et al.  Pattern Recognition and Neural Networks , 1995 .

[18]  Jen-Tzung Chien,et al.  Towards Optimal Bayes Decision for Speech Recognition , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[19]  J. Aitchison,et al.  Principles, practice and performance in decision-making in clinical medicine , 1975 .

[20]  P. Bickel,et al.  Some theory for Fisher''s linear discriminant function , 2004 .

[21]  J. Huelsenbeck,et al.  Inference of Population Structure Under a Dirichlet Process Model , 2007, Genetics.

[22]  Yang Feng,et al.  A road to classification in high dimensional space: the regularized optimal affine discriminant , 2010, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[23]  A. Dawid,et al.  Hyper Markov Laws in the Statistical Analysis of Decomposable Graphical Models , 1993 .

[24]  Ray J. Solomonoff,et al.  A Formal Theory of Inductive Inference. Part I , 1964, Inf. Control..

[25]  Ray J. Solomonoff,et al.  Complexity-based induction systems: Comparisons and convergence theorems , 1978, IEEE Trans. Inf. Theory.

[26]  J. Corander,et al.  Optimal Viterbi Bayesian predictive classification for data from finite alphabets , 2013 .

[27]  Aitken. A.c Determinants And Matrices , 1944 .

[28]  Chin-Hui Lee,et al.  A Bayesian predictive classification approach to robust speech recognition , 2000, IEEE Trans. Speech Audio Process..

[29]  Itsik Pe'er,et al.  Inference of Population Structure from Ancient DNA , 2018, RECOMB.

[30]  Luo Si,et al.  Probabilistic models for answer-ranking in multilingual question-answering , 2010, TOIS.

[31]  A. C. Aitken,et al.  Determinants and matrices , 1940 .

[32]  Jukka Corander,et al.  Have I seen you before? Principles of Bayesian predictive classification revisited , 2013, Stat. Comput..

[33]  Ray J. Solomonoff,et al.  A Formal Theory of Inductive Inference. Part II , 1964, Inf. Control..

[34]  Sergio Verdú,et al.  Generalizing the Fano inequality , 1994, IEEE Trans. Inf. Theory.

[35]  G. McLachlan Discriminant Analysis and Statistical Pattern Recognition , 1992 .

[36]  Seymour Geisser,et al.  8. Predictive Inference: An Introduction , 1995 .

[37]  Arthur Nadas,et al.  Optimal solution of a training problem in speech recognition , 1985, IEEE Trans. Acoust. Speech Signal Process..

[38]  Ray J. Solomonoff,et al.  Three Kinds of Probabilistic Induction: Universal Distributions and Convergence Theorems , 2008, Comput. J..

[39]  B. Ripley Statistical inference for spatial processes , 1990 .