Triggering Memories of Conversations using Multimodal Classifiers

Our personal conversation memory agent is a wearable ‘experience collection’ system, which unobtrusively records the wearer’s conversation, recognizes the face of the dialog partner and remembers his/her voice. When the system sees the same person’s face or hears the same voice it uses a summary of the last conversation with this person to remind the wearer. To correctly identify a person and help remember the earlier conversation, the system must be aware of the current situation, as analyzed from audio and video streams, and classify the situation by combining these modalities. Multimodal classifiers, however, are relatively unstable in the uncontrolled real word environments, and a simple linear interpolation of multiple classification judgments cannot effectively combine multimodal classifiers. We propose a meta-classification strategy using a Support Vector Machine as a new combination strategy. Experimental results show that combining face recognition and speaker identification by meta-classification is dramatically more effective than a linear combination. This meta-classification approach is general enough to be applied to any situation-aware application that needs to combine multiple classifiers.

[1]  M. Turk,et al.  Eigenfaces for Recognition , 1991, Journal of Cognitive Neuroscience.

[2]  Herbert Gish,et al.  GMM sample statistic log-likelihoods for text-independent speaker recognition , 1997, EUROSPEECH.

[3]  Takeo Kanade,et al.  Neural Network-Based Face Detection , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[4]  Alex Pentland,et al.  Face Recognition for Smart Environments , 2000, Computer.

[5]  Vannevar Bush,et al.  As we may think , 1945, INTR.

[6]  Herbert Gish,et al.  Speaker verification with limited enrollment data , 1997, EUROSPEECH.

[7]  Takeo Kanade,et al.  Rotation Invariant Neural Network-Based Face Detection , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[8]  Takeo Kanade,et al.  Name-It: association of face and name in video , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[9]  Alex Pentland,et al.  An Interactive Computer Vision System DyPERS: Dynamic Personal Enhanced Reality System , 1999, ICVS.

[11]  Jim Gray,et al.  What Next? A Few Remaining Problems in Information Technology , 1998, ACM SIGMOD Conference.

[12]  Bradley J. Rhodes,et al.  The wearable remembrance agent: A system for augmented memory , 1997, Digest of Papers. First International Symposium on Wearable Computers.

[13]  Takeo Kanade,et al.  Human Face Detection in Visual Scenes , 1995, NIPS.

[14]  Robert Frischholz,et al.  BioID: A Multimodal Biometric Identification System , 2000, Computer.

[15]  Bruce W. Schmeiser,et al.  Improving model accuracy using optimal linear combinations of trained neural networks , 1995, IEEE Trans. Neural Networks.

[16]  Jiri Matas,et al.  On Combining Classifiers , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[17]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[18]  Bernhard Schölkopf,et al.  Comparing support vector machines with Gaussian kernels to radial basis function classifiers , 1997, IEEE Trans. Signal Process..

[19]  King-Sun Fu,et al.  IEEE Transactions on Pattern Analysis and Machine Intelligence Publication Information , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Takeo Kanade,et al.  Probabilistic modeling of local appearance and spatial relationships for object recognition , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).