Inductive Inference and Partition Exchangeability in Classification

Inductive inference has been a subject of intensive research efforts over several decades. In particular, for classification problems substantial advances have been made and the field has matured into a wide range of powerful approaches to inductive inference. However, a considerable challenge arises when deriving principles for an inductive supervised classifier in the presence of unpredictable or unanticipated events corresponding to unknown alphabets of observable features. Bayesian inductive theories based on de Finetti type exchangeability which have become popular in supervised classification do not apply to such problems. Here we derive an inductive supervised classifier based on partition exchangeability due to John Kingman. It is proven that, in contrast to classifiers based on de Finetti type exchangeability which can optimally handle test items independently of each other in the presence of infinite amounts of training data, a classifier based on partition exchangeability still continues to benefit from a joint prediction of labels for the whole population of test items. Some remarks about the relation of this work to generic convergence results in predictive inference are also given.

[1]  Carl H. Smith,et al.  Inductive Inference: Theory and Methods , 1983, CSUR.

[2]  David G. Stork,et al.  Pattern Classification (2nd ed.) , 1999 .

[3]  Ray J. Solomonoff,et al.  Three Kinds of Probabilistic Induction: Universal Distributions and Convergence Theorems , 2008, Comput. J..

[4]  Yoram Singer,et al.  Efficient Bayesian Parameter Estimation in Large Discrete Domains , 1998, NIPS.

[5]  M. Jackson,et al.  Bayesian Representation of Stochastic Processes under Learning: de Finetti Revisited , 1999 .

[6]  D. Hand,et al.  Idiot's Bayes—Not So Stupid After All? , 2001 .

[7]  David L. Dowe,et al.  Foreword re C. S. Wallace , 2008, Comput. J..

[8]  Brian D. Ripley,et al.  Pattern Recognition and Neural Networks , 1996 .

[9]  Ryszard S. Michalski,et al.  A Theory and Methodology of Inductive Learning , 1983, Artificial Intelligence.

[10]  J. Kingman Uses of Exchangeability , 1978 .

[11]  Seymour Geisser,et al.  8. Predictive Inference: An Introduction , 1995 .

[12]  R. Jeffrey Probabilism and induction , 1986 .

[13]  Anil K. Jain,et al.  Statistical Pattern Recognition: A Review , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[14]  Jukka Corander,et al.  Have I seen you before? Principles of Bayesian predictive classification revisited , 2013, Stat. Comput..

[15]  Ray J. Solomonoff,et al.  A Formal Theory of Inductive Inference. Part II , 1964, Inf. Control..

[16]  Philip J. Stone,et al.  Experiments in induction , 1966 .

[17]  D. Blackwell,et al.  Merging of Opinions with Increasing Information , 1962 .

[18]  David G. Stork,et al.  Pattern Classification , 1973 .

[19]  J. Kingman The Representation of Partition Structures , 1978 .

[20]  Ray J. Solomonoff,et al.  Complexity-based induction systems: Comparisons and convergence theorems , 1978, IEEE Trans. Inf. Theory.

[21]  Eric Postal,et al.  Predicting the unpredictable , 1992, Synthese.

[22]  J. Corander,et al.  Random Partition Models and Exchangeability for Bayesian Identification of Population Structure , 2007, Bulletin of mathematical biology.

[23]  J. Kingman Random partitions in population genetics , 1978, Proceedings of the Royal Society of London. A. Mathematical and Physical Sciences.

[24]  David L. Dowe,et al.  MML, hybrid Bayesian network graphical models, statistical consistency, invarianc , 2010 .

[25]  K J Dawson,et al.  A Bayesian approach to the identification of panmictic populations and the assignment of individuals. , 2001, Genetical research.

[26]  Chong Wang,et al.  Decoupling Sparsity and Smoothness in the Discrete Hierarchical Dirichlet Process , 2009, NIPS.

[27]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[28]  C. N. Liu,et al.  Approximating discrete probability distributions with dependence trees , 1968, IEEE Trans. Inf. Theory.

[29]  Alon Orlitsky,et al.  Universal compression of memoryless sources over unknown alphabets , 2004, IEEE Transactions on Information Theory.

[30]  Paul Joyce,et al.  Partition structures and sufficient statistics , 1998, Journal of Applied Probability.

[31]  J. Kingman The population structure associated with the Ewens sampling formula. , 1977, Theoretical population biology.

[32]  Arthur Nadas,et al.  Optimal solution of a training problem in speech recognition , 1985, IEEE Trans. Acoust. Speech Signal Process..