Scalable Calibration of Affinity Matrices from Incomplete Observations

Estimating pairwise affinity matrices for given data samples is a basic problem in data processing applications. Accurately determining the affinity becomes impossible when the samples are not fully observed and approximate estimations have to be sought. In this paper, we investigated calibration approaches to improve the quality of an approximate affinity matrix. By projecting the matrix onto a closed and convex subset of matrices that meets specific constraints, the calibrated result is guaranteed to get nearer to the unknown true affinity matrix than the un-calibrated matrix, except in rare cases they are identical. To realize the calibration, we developed two simple, efficient, and yet effective algorithms that scale well. One algorithm applies cyclic updates and the other algorithm applies parallel updates. In a series of evaluations, the empirical results justified the theoretical benefits of the proposed algorithms, and demonstrated their high potential in practical applications.

[1]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[2]  S. H. Cheng,et al.  A Modified Cholesky Algorithm Based on a Symmetric Indefinite Factorization , 1998, SIAM J. Matrix Anal. Appl..

[3]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[4]  Karin Schwab,et al.  Best Approximation In Inner Product Spaces , 2016 .

[5]  John Wright,et al.  Robust Principal Component Analysis: Exact Recovery of Corrupted Low-Rank Matrices via Convex Optimization , 2009, NIPS.

[6]  Robert H. Halstead,et al.  Matrix Computations , 2011, Encyclopedia of Parallel Computing.

[7]  Anthony Widjaja,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2003, IEEE Transactions on Neural Networks.

[8]  P. Jaccard THE DISTRIBUTION OF THE FLORA IN THE ALPINE ZONE.1 , 1912 .

[9]  Sabrina Eberhart,et al.  Applied Missing Data Analysis , 2016 .

[10]  Kim-Chuan Toh,et al.  SDPNAL+: A Matlab software for semidefinite programming with bound constraints (version 1.0) , 2017, Optim. Methods Softw..

[11]  B. Philippe,et al.  Parallel Algorithms for the Singular Value Decomposition , 2005 .

[12]  R. Dykstra An Algorithm for Restricted Least Squares Regression , 1983 .

[13]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[14]  J. Berge,et al.  Least-squares approximation of an improper correlation matrix by a proper one , 1989 .

[15]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[16]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[17]  Gerard Salton,et al.  A vector space model for automatic indexing , 1975, CACM.

[18]  A. K. Cline,et al.  Computation of the Singular Value Decomposition , 2006 .

[19]  D. A. Edwards The mathematical foundations of quantum mechanics , 1979, Synthese.

[20]  Wenye Li,et al.  Estimating Jaccard Index with Missing Observations: A Matrix Calibration Approach , 2015, NIPS.

[21]  N. Higham Computing the nearest correlation matrix—a problem from finance , 2002 .

[22]  Nicole A. Lazar,et al.  Statistical Analysis With Missing Data , 2003, Technometrics.

[23]  Kwong-Sak Leung,et al.  Large-scale RLSC learning without agony , 2007, ICML '07.

[24]  D J Rogers,et al.  A Computer Program for Classifying Plants. , 1960, Science.

[25]  Sophie Ahrens,et al.  Recommender Systems , 2012 .

[26]  Michael I. Jordan,et al.  Supervised learning from incomplete data via an EM approach , 1993, NIPS.

[27]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[28]  Calton Pu,et al.  Evolutionary study of web spam: Webb Spam Corpus 2011 versus Webb Spam Corpus 2006 , 2012, 8th International Conference on Collaborative Computing: Networking, Applications and Worksharing (CollaborateCom).

[29]  Frank-Michael Schleif,et al.  Metric and non-metric proximity transformations at linear costs , 2014, Neurocomputing.

[30]  Marcos Raydan,et al.  An acceleration scheme for Dykstra’s algorithm , 2016, Comput. Optim. Appl..