Regularization and Kernelization of the Maximin Correlation Approach

Robust classification becomes challenging when each class consists of multiple subclasses. Examples include multi-font optical character recognition and automated protein function prediction. In correlation-based nearest-neighbor classification, the maximin correlation approach (MCA) provides the worst-case optimal solution by minimizing the maximum misclassification risk through an iterative procedure. Despite the optimality, the original MCA has drawbacks that have limited its wide applicability in practice. That is, the MCA tends to be sensitive to outliers, cannot effectively handle nonlinearities in datasets, and suffers from having high computational complexity. To address these limitations, we propose an improved solution, named regularized MCA (R-MCA). We first reformulate MCA as a quadratically constrained linear programming (QCLP) problem, incorporate regularization by introducing slack variables in the primal problem of the QCLP, and derive the corresponding Lagrangian dual. The dual formulation enables us to apply the kernel trick to R-MCA, so that it can better handle nonlinearities. Our experimental results demonstrate that the regularization and kernelization make the proposed R-MCA more robust and accurate for various classification tasks than the original MCA. Furthermore, when the data size or dimensionality grows, R-MCA runs substantially faster by solving either the primal or dual (whichever has a smaller variable dimension) of the QCLP.

[1]  Jos F. Sturm,et al.  A Matlab toolbox for optimization over symmetric cones , 1999 .

[2]  J. Mercer Functions of Positive and Negative Type, and their Connection with the Theory of Integral Equations , 1909 .

[3]  Shai Avidan,et al.  FasT-Match: Fast Affine Template Matching , 2013, CVPR.

[4]  David G. Lowe,et al.  Local Naive Bayes Nearest Neighbor for image classification , 2011, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Jing Wang,et al.  Scalable k-NN graph construction for visual descriptors , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Jiri Matas,et al.  P-N learning: Bootstrapping binary classifiers by structural constraints , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[7]  Jack Satsangi,et al.  Regional variation in gene expression in the healthy colon is dysregulated in ulcerative colitis , 2008, Gut.

[8]  Cordelia Schmid,et al.  TagProp: Discriminative metric learning in nearest neighbor models for image auto-annotation , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[9]  Anthony Widjaja,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2003, IEEE Transactions on Neural Networks.

[10]  Sungroh Yoon,et al.  K-maximin clustering: a maximin correlation approach to partition-based clustering , 2009, IEICE Electron. Express.

[11]  I. Jolliffe Principal Component Analysis , 2002 .

[12]  Christos Faloutsos,et al.  Efficient and effective Querying by Image Content , 1994, Journal of Intelligent Information Systems.

[13]  Jonathan Goldstein,et al.  When Is ''Nearest Neighbor'' Meaningful? , 1999, ICDT.

[14]  Dimitris Bertsimas,et al.  Characterization of the equivalence of robustification and regularization in linear, median, and matrix regression , 2014, ArXiv.

[15]  Dennis B. Troup,et al.  NCBI GEO: mining millions of expression profiles—database and tools , 2004, Nucleic Acids Res..

[16]  Frank Chongwoo Park,et al.  Diffusion Decision Making for Adaptive k-Nearest Neighbor Classification , 2012, NIPS.

[17]  Keinosuke Fukunaga,et al.  An Optimal Global Nearest Neighbor Metric , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Eamonn J. Keogh Nearest Neighbor , 2010, Encyclopedia of Machine Learning.

[19]  Cyrus Shahabi,et al.  Authentication of k Nearest Neighbor Query on Road Networks , 2014, IEEE Transactions on Knowledge and Data Engineering.

[20]  Dong Liu,et al.  Large-Scale Video Hashing via Structure Learning , 2013, 2013 IEEE International Conference on Computer Vision.

[21]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[22]  Jianliang Xu,et al.  Grid-partition index: a hybrid method for nearest-neighbor queries in wireless location-based services , 2005, The VLDB Journal.

[23]  Sungroh Yoon,et al.  Application of maximin correlation analysis to classifying protein environments for function prediction. , 2010, Biochemical and biophysical research communications.

[24]  Bernhard Schölkopf,et al.  Kernel Principal Component Analysis , 1997, ICANN.

[25]  Terrence J. Sejnowski,et al.  Analysis of hidden units in a layered network trained to classify sonar targets , 1988, Neural Networks.

[26]  Michel Barlaud,et al.  High-Dimensional Statistical Measure for Region-of-Interest Tracking , 2009, IEEE Transactions on Image Processing.

[27]  Joydeep Ghosh,et al.  Hierarchical Fusion of Multiple Classifiers for Hyperspectral Data Analysis , 2002, Pattern Analysis & Applications.

[28]  Laura Schweitzer,et al.  Advances In Kernel Methods Support Vector Learning , 2016 .

[29]  Paul Geladi,et al.  Principal Component Analysis , 1987, Comprehensive Chemometrics.

[30]  Hadar I. Avi-Itzhak,et al.  Multiple Subclass Pattern Recognition: A Maximin Correlation Approach , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[31]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[32]  Santiago Costantino,et al.  Adaptive settings for the nearest-neighbor particle tracking algorithm , 2015, Bioinform..

[33]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[34]  James E. Fowler,et al.  Hyperspectral classification using a composite kernel driven by nearest-neighbor spatial features , 2015, 2015 IEEE International Conference on Image Processing (ICIP).

[35]  Hongdong Li,et al.  Kernel Methods on the Riemannian Manifold of Symmetric Positive Definite Matrices , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[36]  Jitendra Malik,et al.  Recognizing action at a distance , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[37]  Robert J. Vanderbei,et al.  An Interior-Point Method for Semidefinite Programming , 1996, SIAM J. Optim..

[38]  M. Aizerman,et al.  Theoretical Foundations of the Potential Function Method in Pattern Recognition Learning , 1964 .

[39]  Yu Wang,et al.  A Fast KNN Algorithm for Text Categorization , 2007, 2007 International Conference on Machine Learning and Cybernetics.

[40]  Alexander J. Smola,et al.  Learning with Kernels: support vector machines, regularization, optimization, and beyond , 2001, Adaptive computation and machine learning series.

[41]  Juyang Weng,et al.  Using Discriminant Eigenfeatures for Image Retrieval , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[42]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[43]  Bernhard Schölkopf,et al.  Kernel Principal Component Analysis , 1997, International Conference on Artificial Neural Networks.

[44]  Yufei Tao,et al.  Fast Nearest Neighbor Search with Keywords , 2014, IEEE Transactions on Knowledge and Data Engineering.

[45]  A. Yuille Deformable Templates for Face Recognition , 1991, Journal of Cognitive Neuroscience.

[46]  David H. Mathews,et al.  NNDB: the nearest neighbor parameter database for predicting stability of nucleic acid secondary structure , 2009, Nucleic Acids Res..

[47]  Frank Lebourgeois Robust multifont OCR system from gray level images , 1997, Proceedings of the Fourth International Conference on Document Analysis and Recognition.

[48]  Stephen P. Boyd,et al.  Applications of second-order cone programming , 1998 .