Class consistent multi-modal fusion with binary features

Many existing recognition algorithms combine different modalities based on training accuracy but do not consider the possibility of noise at test time. We describe an algorithm that perturbs test features so that all modalities predict the same class. We enforce this perturbation to be as small as possible via a quadratic program (QP) for continuous features, and a mixed integer program (MIP) for binary features. To efficiently solve the MIP, we provide a greedy algorithm and empirically show that its solution is very close to that of a state-of-the-art MIP solver. We evaluate our algorithm on several datasets and show that the method outperforms existing approaches.

[1]  Silvio Savarese,et al.  Detecting and tracking people using an RGB-D camera via multiple detector fusion , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[2]  Massimo Tistarelli,et al.  Feature Level Fusion of Face and Fingerprint Biometrics , 2007, 2007 First IEEE International Conference on Biometrics: Theory, Applications, and Systems.

[3]  Lawrence Carin,et al.  Sparse multinomial logistic regression: fast algorithms and generalization bounds , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Svetlana Lazebnik,et al.  Iterative quantization: A procrustean approach to learning binary codes , 2011, CVPR 2011.

[5]  Mohan S. Kankanhalli,et al.  Multimodal fusion for multimedia analysis: a survey , 2010, Multimedia Systems.

[6]  Ethem Alpaydin,et al.  Multiple Kernel Learning Algorithms , 2011, J. Mach. Learn. Res..

[7]  Yiannis Aloimonos,et al.  Segmenting “simple” objects using RGB-D , 2012, 2012 IEEE International Conference on Robotics and Automation.

[8]  N. Cristianini,et al.  On Kernel-Target Alignment , 2001, NIPS.

[9]  Jonghyun Choi,et al.  Predictable Dual-View Hashing , 2013, ICML.

[10]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[11]  Sharath Pankanti,et al.  Filterbank-based fingerprint matching , 2000, IEEE Trans. Image Process..

[12]  Dieter Fox,et al.  Sparse distance learning for object recognition combining RGB and depth information , 2011, 2011 IEEE International Conference on Robotics and Automation.

[13]  Patrick J. Flynn,et al.  An evaluation of multimodal 2D+3D face biometrics , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Xiaoli Zhou,et al.  Feature Fusion of Face and Gait for Human Recognition at a Distance in Video , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[15]  David J. Fleet,et al.  Fast search in Hamming space with multi-index hashing , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Koby Crammer,et al.  On the Algorithmic Implementation of Multiclass Kernel-based Vector Machines , 2002, J. Mach. Learn. Res..

[17]  Dieter Fox,et al.  A large-scale hierarchical multi-view RGB-D object dataset , 2011, 2011 IEEE International Conference on Robotics and Automation.

[18]  Damon L. Woodard,et al.  Non-ideal iris segmentation using graph cuts , 2008, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[19]  Sharath Pankanti,et al.  The relation between the ROC curve and the CMC , 2005, Fourth IEEE Workshop on Automatic Identification Advanced Technologies (AutoID'05).

[20]  A. Ross,et al.  Level Fusion Using Hand and Face Biometrics , 2005 .

[21]  Cyrus Rashtchian,et al.  Every Picture Tells a Story: Generating Sentences from Images , 2010, ECCV.

[22]  Rama Chellappa,et al.  Joint Sparse Representation for Robust Multimodal Biometrics Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Ali Farhadi,et al.  Attribute Discovery via Predictable Discriminative Binary Codes , 2012, ECCV.

[24]  Manik Varma,et al.  Learning The Discriminative Power-Invariance Trade-Off , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[25]  Terrance E. Boult,et al.  Classification Enhancement via Biometric Pattern Perturbation , 2005, AVBPA.

[26]  Arun Ross,et al.  Feature level fusion of hand and face biometrics , 2005, SPIE Defense + Commercial Sensing.

[27]  Kristen Grauman,et al.  Kernelized Locality-Sensitive Hashing , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Dong Liu,et al.  Sample-Specific Late Fusion for Visual Category Recognition , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[29]  Andrew Zisserman,et al.  Multiple kernels for object detection , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[30]  Dong Liu,et al.  Robust late fusion with rank minimization , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[31]  Dieter Fox,et al.  Depth kernel descriptors for object recognition , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[32]  Dieter Fox,et al.  RGB-(D) scene labeling: Features and algorithms , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[33]  Dieter Fox,et al.  Kernel Descriptors for Visual Recognition , 2010, NIPS.

[34]  Lorenzo Torresani,et al.  Scalable object-class retrieval with approximate and top-k ranking , 2011, 2011 International Conference on Computer Vision.

[35]  Cordelia Schmid,et al.  Transformation Pursuit for Image Classification , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[36]  Libor Masek,et al.  MATLAB Source Code for a Biometric Identification System Based on Iris Patterns , 2003 .

[37]  Cristian Sminchisescu,et al.  Efficient Match Kernel between Sets of Features for Visual Recognition , 2009, NIPS.