RABOC: An approach to handle class imbalance in multimodal biometric authentication

Abstract Class imbalance poses serious difficulties to most standard two-class classifiers, when applied in performing classification in the context of multimodal biometric authentication. Most conventional classifiers assume equally balanced classes. They do not work well when impostor samples vastly outnumber the samples of the genuine user class. In this paper, we propose an algorithm, called RABOC, which inherits the natural capabilities of one-class classification and Real AdaBoost algorithm to handle the class imbalance problem in biometric systems. Particularly, we develop a weak classifier, which consists of one-class classifiers and is trained using data from both classes. We then exploit Real AdaBoost to combine the multiple weak classifiers in order to improve their performance without causing overfitting. Unlike the conventional Real AdaBoost, the weak classifiers in the proposed schema are learned on the same data set, but with different parameter choices. This not only generates the diversity necessary to make RABOC work, but also reduces the number of user-specified parameters. Extensive experiments were carried out on the BioSecure DS2 and XM2VTS benchmark databases, which involve data with extremely imbalanced class distribution. They demonstrate that the proposed RABOC algorithm can achieve a relative performance improvement of 28%, 24%, and 22% as compared to other state-of-the-art techniques, specifically the sum of scores, likelihood ratio based score fusion, and Support Vector Machines.

[1]  Xin Yao,et al.  Diversity analysis on imbalanced data sets by using ensemble models , 2009, 2009 IEEE Symposium on Computational Intelligence and Data Mining.

[2]  Zhi-Hua Zhou,et al.  Exploratory Under-Sampling for Class-Imbalance Learning , 2006, Sixth International Conference on Data Mining (ICDM'06).

[3]  Josef Kittler,et al.  A multimodal biometric test bed for quality-dependent, cost-sensitive and client-specific score-level fusion algorithms , 2010, Pattern Recognit..

[4]  David G. Stork,et al.  Pattern Classification , 1973 .

[5]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[6]  Adam Kowalczyk,et al.  Extreme re-balancing for SVMs: a case study , 2004, SKDD.

[7]  Francisco Herrera,et al.  A Review on Ensembles for the Class Imbalance Problem: Bagging-, Boosting-, and Hybrid-Based Approaches , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[8]  Nathalie Japkowicz,et al.  Supervised Versus Unsupervised Binary-Learning by Feedforward Neural Networks , 2004, Machine Learning.

[9]  Samy Bengio,et al.  Database, protocols and tools for evaluating score-level fusion algorithms in biometric authentication , 2006, Pattern Recognit..

[10]  Luiz Eduardo Soares de Oliveira,et al.  Combining different biometric traits with one-class classification , 2009, Signal Process..

[11]  Arun Ross,et al.  Score normalization in multimodal biometric systems , 2005, Pattern Recognit..

[12]  Andrew R. Barron,et al.  Mixture Density Estimation , 1999, NIPS.

[13]  De-Shuang Huang,et al.  Cancer classification using Rotation Forest , 2008, Comput. Biol. Medicine.

[14]  Roberto Brunelli,et al.  Person identification using multiple cues , 1995, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[16]  Haibo He,et al.  Learning from Imbalanced Data , 2009, IEEE Transactions on Knowledge and Data Engineering.

[17]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[18]  Juan José Rodríguez Diez,et al.  Rotation Forest: A New Classifier Ensemble Method , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Nuno Vasconcelos,et al.  Cost-Sensitive Boosting , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  E. Mayoraz,et al.  Fusion of face and speech data for person identity verification , 1999, IEEE Trans. Neural Networks.

[21]  Jorma Rissanen,et al.  Stochastic Complexity in Statistical Inquiry , 1989, World Scientific Series in Computer Science.

[22]  Kenneth Kennedy,et al.  Learning without Default: A Study of One-Class Classification and the Low-Default Portfolio Problem , 2009, AICS.

[23]  J. Friedman Special Invited Paper-Additive logistic regression: A statistical view of boosting , 2000 .

[24]  Arun Ross,et al.  Handbook of Multibiometrics , 2006, The Kluwer international series on biometrics.

[25]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[26]  Ludmila I. Kuncheva,et al.  Measures of Diversity in Classifier Ensembles and Their Relationship with the Ensemble Accuracy , 2003, Machine Learning.

[27]  Subhash C. Bagui,et al.  Combining Pattern Classifiers: Methods and Algorithms , 2005, Technometrics.

[28]  P. Green,et al.  On Bayesian Analysis of Mixtures with an Unknown Number of Components (with discussion) , 1997 .

[29]  Taghi M. Khoshgoftaar,et al.  RUSBoost: A Hybrid Approach to Alleviating Class Imbalance , 2010, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[30]  Arun Ross,et al.  A comparison of imputation methods for handling missing scores in biometric fusion , 2012, Pattern Recognit..

[31]  Josef Kittler,et al.  Adaptive client-impostor centric score normalization: A case study in fingerprint verification , 2009, 2009 IEEE 3rd International Conference on Biometrics: Theory, Applications, and Systems.

[32]  Yoram Singer,et al.  Improved Boosting Algorithms Using Confidence-rated Predictions , 1998, COLT' 98.

[33]  D. Panchenko,et al.  Risk bounds for mixture density estimation , 2005 .

[34]  Norman Poh,et al.  Multi-system Biometric Authentication: Optimal Fusion and User-Specific Information , 2006 .

[35]  Nitesh V. Chawla,et al.  SMOTEBoost: Improving Prediction of the Minority Class in Boosting , 2003, PKDD.

[36]  Yoav Freund,et al.  A Short Introduction to Boosting , 1999 .

[37]  Kagan Tumer,et al.  Error Correlation and Error Reduction in Ensemble Classifiers , 1996, Connect. Sci..

[38]  Yang Wang,et al.  Cost-sensitive boosting for classification of imbalanced data , 2007, Pattern Recognit..

[39]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[40]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[41]  Xudong Jiang,et al.  Exploiting global and local decisions for multimodal biometrics verification , 2004, IEEE Transactions on Signal Processing.

[42]  Jiri Matas,et al.  On Combining Classifiers , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[43]  Rocco A. Servedio,et al.  Random classification noise defeats all convex potential boosters , 2008, ICML '08.

[44]  Adrian E. Raftery,et al.  Linear flaw detection in woven textiles using model-based clustering , 1997, Pattern Recognit. Lett..

[45]  Julian Fiérrez,et al.  Adapted user-dependent multimodal biometric authentication exploiting general information , 2005, Pattern Recognit. Lett..

[46]  Salvatore J. Stolfo,et al.  AdaCost: Misclassification Cost-Sensitive Boosting , 1999, ICML.

[47]  B. Silverman Density estimation for statistics and data analysis , 1986 .

[48]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[49]  Anil K. Jain,et al.  Likelihood Ratio-Based Biometric Score Fusion , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.