Meta-classification: Combining Multimodal Classifiers

Combining multiple classifiers is of particular interest in multimedia applications. Each modality in multimedia data can be analyzed individually, and combining multiple pieces of evidence can usually improve classification accuracy. However, most combination strategies used in previous studies implement some ad hoc designs, and ignore the varying expertise of specialized individual modality classifiers in recognizing a category under particular circumstances. In this paper we present a combination framework called meta-classification, which models the problem of combining classifiers as a classification problem itself. We apply the technique on a wearable experience collection system, which unobtrusively records the wearer's conversation, recognizes the face of the dialogue partner, and remember his/her voice. When the system sees the same person's face or hears the same voice, it can then use a of the last conversation to remind the wearer. To identify a person correctly from a mixture of audio and video streams, classification judgments from multiple modalities must be effectively combined. Experimental results show that combining different face recognizers and speaker identification aspects using the meta-classification strategy can dramatically improve classification accuracy, and is more effective than a fixed probability-based strategy. Other work in labeling weather news broadcasts showed that meta-classification is a general framework that can be applied to any application that needs to combine multiple classifiers without much modification.

[1]  Herbert Gish,et al.  GMM sample statistic log-likelihoods for text-independent speaker recognition , 1997, EUROSPEECH.

[2]  Robert Frischholz,et al.  BioID: A Multimodal Biometric Identification System , 2000, Computer.

[3]  Wei-Hao Lin,et al.  News video classification using SVM-based multimodal classifiers and combination strategies , 2002, MULTIMEDIA '02.

[4]  Tomaso A. Poggio,et al.  A general framework for object detection , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[6]  Bruce W. Schmeiser,et al.  Improving model accuracy using optimal linear combinations of trained neural networks , 1995, IEEE Trans. Neural Networks.

[7]  Richard Shillcock,et al.  Proceedings of EUROSPEECH-1991. , 1991 .

[8]  Herbert Gish,et al.  Speaker verification with limited enrollment data , 1997, EUROSPEECH.

[9]  Takeo Kanade,et al.  Rotation Invariant Neural Network-Based Face Detection , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[10]  Lars Kai Hansen,et al.  Neural Network Ensembles , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[11]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[12]  Bernhard Schölkopf,et al.  Comparing support vector machines with Gaussian kernels to radial basis function classifiers , 1997, IEEE Trans. Signal Process..

[13]  Christopher J. C. Burges,et al.  A Tutorial on Support Vector Machines for Pattern Recognition , 1998, Data Mining and Knowledge Discovery.

[14]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[15]  Takeo Kanade,et al.  Human Face Detection in Visual Scenes , 1995, NIPS.

[16]  Takeo Kanade,et al.  Neural Network-Based Face Detection , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[17]  P. L. Venetianer,et al.  An Iris Biometric System for Public and Personal Use , 2000, Computer.

[18]  Alex Pentland,et al.  Face Recognition for Smart Environments , 2000, Computer.

[19]  Rong Jin,et al.  Triggering Memories of Conversations using Multimodal Classifiers , 2002 .

[20]  Vannevar Bush,et al.  As we may think , 1945, INTR.

[21]  Takeo Kanade,et al.  Probabilistic modeling of local appearance and spatial relationships for object recognition , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[22]  M. Turk,et al.  Eigenfaces for Recognition , 1991, Journal of Cognitive Neuroscience.

[23]  Jim Gray,et al.  What Next? A Few Remaining Problems in Information Technology , 1998, ACM SIGMOD Conference.

[24]  Jiri Matas,et al.  On Combining Classifiers , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[25]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[26]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .