Linear Regression-based Classifier for audio visual person identification

This paper presents an audio visual (AV) person identification system using Linear Regression-based Classifier (LRC) for person identification. Class specific models are created by stacking q-dimensional speech and image vectors from the training data. The person identification task is considered a linear regression problem, i.e., a test (speech or image) feature vector is expressed as a linear combination of the (speech or image) model of the class it belongs to. The Euclidean distance between a test feature vector and the estimated response vectors for all the class specific models are used as matching scores. These matching scores from both modalities are normalized using the min-max score normalization technique and then combined using the the sum rule of fusion. The system was tested on 88 subjects from the AusTalk AV database. Experimental results show that the identification accuracy after AV fusion is higher compared to the identification accuracy of an individual modality.

[1]  R Togneri,et al.  An Overview of Speaker Identification: Accuracy and Robustness Issues , 2011, IEEE Circuits and Systems Magazine.

[2]  Thomas P. Ryan,et al.  Modern Regression Methods , 1996 .

[3]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, International Journal of Computer Vision.

[4]  Mohammed Bennamoun,et al.  Linear Regression for Face Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Lin Huang,et al.  Multi-resolution pyramidal Gabor-eigenface algorithm for face recognition , 2004, Third International Conference on Image and Graphics (ICIG'04).

[6]  S R Mahadeva Prasanna,et al.  Bimodal biometric person authentication using speech and face under degraded condition , 2011, 2011 National Conference on Communications (NCC).

[7]  Ronen Basri,et al.  Lambertian reflectance and linear subspaces , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[8]  Dominique Estival,et al.  Building an Audio-Visual Corpus of Australian English: Large Corpus Collection with an Economical Portable and Replicable Black Box , 2011, INTERSPEECH.

[9]  Michael B. Miller Linear Regression Analysis , 2013 .

[10]  David J. Kriegman,et al.  Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection , 1996, ECCV.

[11]  Arun Ross,et al.  Information fusion in biometrics , 2003, Pattern Recognit. Lett..

[12]  Jiri Matas,et al.  On Combining Classifiers , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[13]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[14]  Chenxi Yu,et al.  Biometric recognition by using audio and visual feature fusion , 2012, 2012 International Conference on System Science and Engineering (ICSSE).

[15]  Haizhou Li,et al.  An overview of text-independent speaker recognition: From features to supervectors , 2010, Speech Commun..

[16]  Kah Phooi Seng,et al.  Audio-Visual Recognition System in Compression Domain , 2011, IEEE Transactions on Circuits and Systems for Video Technology.

[17]  Douglas A. Reynolds,et al.  Robust text-independent speaker identification using Gaussian mixture speaker models , 1995, IEEE Trans. Speech Audio Process..

[18]  Xuran Zhao,et al.  CO-LDA: A Semi-supervised Approach to Audio-Visual Person Recognition , 2012, 2012 IEEE International Conference on Multimedia and Expo.

[19]  B. K. Julsing,et al.  Face Recognition with Local Binary Patterns , 2012 .