Pattern recognition with a Bayesian kernel combination machine

In this paper, we describe a Bayesian classification method that informatively combines diverse sources of information and multiple feature spaces for multiclass problems. The proposed method is based on recent advances in kernel approaches where the integration of multiple object descriptors, or feature spaces, is achieved via kernel combination. Each kernel constructs a similarity metric between objects in a particular feature space and then having a common metric across modalities an overall combination can be constructed. We follow a hierarchical Bayesian approach, which introduces prior distributions over random variables and we construct a Gibbs sampling Markov chain Monte Carlo (MCMC) solution which is naturally derived from the employed multinomial probit likelihood. The methodology is the basis for possible deterministic approximations such as variational or maximum-a-posteriori estimators, and it is compared against the well-known classifier combination methods on the classification of handwritten numerals. The results of the proposed method show a significant improvement over the best individual classifier and match the performance of the best multiple classifier combination, whilst reducing the computational requirements of combining classifiers and offering additional information on the significance of the contributing sources.

[1]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2003, ICTAI.

[2]  William Stafford Noble,et al.  Nonstationary kernel combination , 2006, ICML.

[3]  Richard Bellman,et al.  Adaptive Control Processes: A Guided Tour , 1961, The Mathematical Gazette.

[4]  C. Holmes,et al.  Bayesian auxiliary variable models for binary and multinomial regression , 2006 .

[5]  W. K. Hastings,et al.  Monte Carlo Sampling Methods Using Markov Chains and Their Applications , 1970 .

[6]  Jiri Matas,et al.  On Combining Classifiers , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  S. Chib,et al.  Bayesian analysis of binary and polychotomous response data , 1993 .

[8]  Josef Kittler,et al.  Combining multiple classifiers by averaging or by multiplying? , 2000, Pattern Recognit..

[9]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[10]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[11]  E. Jaynes Probability theory : the logic of science , 2003 .

[12]  David H. Wolpert,et al.  Stacked generalization , 1992, Neural Networks.

[13]  Radford M. Neal Regression and Classification Using Gaussian Process Priors , 2009 .

[14]  Alexander J. Smola,et al.  Learning the Kernel with Hyperkernels , 2005, J. Mach. Learn. Res..

[15]  William Stafford Noble,et al.  Support vector machine learning from heterogeneous data: an empirical analysis using protein sequence and structure , 2006, Bioinform..

[16]  Mingjun Zhong,et al.  Data Integration for Classification Problems Employing Gaussian Process Priors , 2006, NIPS.

[17]  Wan-Jui Lee,et al.  Kernel Combination Versus Classifier Combination , 2007, MCS.

[18]  Simon Rogers,et al.  Hierarchic Bayesian models for kernel learning , 2005, ICML.

[19]  Nageswara S. V. Rao,et al.  On Fusers that Perform Better than Best Sensor , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[20]  Michael E. Tipping Sparse Bayesian Learning and the Relevance Vector Machine , 2001, J. Mach. Learn. Res..

[21]  Nello Cristianini,et al.  Learning the Kernel Matrix with Semidefinite Programming , 2002, J. Mach. Learn. Res..

[22]  Theodoros Damoulas,et al.  Probabilistic multi-class multi-kernel learning: on protein fold recognition and remote homology detection , 2008, Bioinform..

[23]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[24]  Thomas G. Dietterich Multiple Classifier Systems , 2000, Lecture Notes in Computer Science.

[25]  David B. Dunson,et al.  Bayesian Data Analysis , 2010 .

[26]  Geoffrey E. Hinton,et al.  Bayesian Learning for Neural Networks , 1995 .

[27]  Jason Weston,et al.  Gene functional classification from heterogeneous data , 2001, RECOMB.