Efficient Pairwise Learning Using Kernel Ridge Regression: an Exact Two-Step Method

Pairwise learning or dyadic prediction concerns the prediction of properties for pairs of objects. It can be seen as an umbrella covering various machine learning problems such as matrix completion, collaborative filtering, multi-task learning, transfer learning, network prediction and zero-shot learning. In this work we analyze kernel-based methods for pairwise learning, with a particular focus on a recently-suggested two-step method. We show that this method offers an appealing alternative for commonly-applied Kronecker-based methods that model dyads by means of pairwise feature representations and pairwise kernels. In a series of theoretical results, we establish correspondences between the two types of methods in terms of linear algebra and spectral filtering, and we analyze their statistical consistency. In addition, the two-step method allows us to establish novel algorithmic shortcuts for efficient training and validation on very large datasets. Putting those properties together, we believe that this simple, yet powerful method can become a standard tool for many problems. Extensive experimental results for a range of practical settings are reported.

[1]  Ingo Steinwart,et al.  On the Influence of the Kernel on the Consistency of Support Vector Machines , 2002, J. Mach. Learn. Res..

[2]  Eyke Hüllermeier,et al.  Identification of Functionally Related Enzymes by Learning-to-Rank Methods , 2014, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[3]  Yiming Yang,et al.  Bipartite Edge Prediction via Transductive Learning over Product Graphs , 2015, ICML.

[4]  Lorenzo Rosasco,et al.  Spectral Algorithms for Supervised Learning , 2008, Neural Computation.

[5]  R. Rifkin,et al.  Notes on Regularized Least Squares , 2007 .

[6]  Petros Drineas,et al.  On the Nyström Method for Approximating a Gram Matrix for Improved Kernel-Based Learning , 2005, J. Mach. Learn. Res..

[7]  Tapio Salakoski,et al.  Learning intransitive reciprocal relations with kernel methods , 2010, Eur. J. Oper. Res..

[8]  Dave Zachariah,et al.  Alternating Least-Squares for Low-Rank Matrix Reconstruction , 2012, IEEE Signal Processing Letters.

[9]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[10]  Charles Elkan,et al.  Link Prediction via Matrix Factorization , 2011, ECML/PKDD.

[11]  Arindam Banerjee,et al.  Generalized Probabilistic Matrix Factorizations for Collaborative Filtering , 2010, 2010 IEEE International Conference on Data Mining.

[12]  Neil D. Lawrence,et al.  Kernels for Vector-Valued Functions: a Review , 2011, Found. Trends Mach. Learn..

[13]  Charles A. Micchelli,et al.  A Spectral Regularization Framework for Multi-Task Structure Learning , 2007, NIPS.

[14]  Luo Si,et al.  Matrix co-factorization for recommendation with rich side information and implicit feedback , 2011, HetRec '11.

[15]  William Stafford Noble,et al.  A new pairwise kernel for biological network inference with support vector machines , 2007, BMC Bioinformatics.

[16]  Pierre Geurts,et al.  On protocols and measures for the validation of supervised methods for the inference of biological networks , 2013, Front. Genet..

[17]  Julie L. Yang,et al.  Affinity regression predicts the recognition code of nucleic acid binding proteins , 2015, Nature Biotechnology.

[18]  Mario Bertero,et al.  Introduction to Inverse Problems in Imaging , 1998 .

[19]  E. Candès,et al.  Exact low-rank matrix completion via convex optimization , 2008, 2008 46th Annual Allerton Conference on Communication, Control, and Computing.

[20]  Georgios Paliouras,et al.  LSHTC: A Benchmark for Large-Scale Text Classification , 2015, ArXiv.

[21]  Yoshihiro Yamanishi,et al.  Cartesian Kernel: An Efficient Alternative to the Pairwise Kernel , 2010, IEICE Trans. Inf. Syst..

[22]  Yoshihiro Yamanishi,et al.  Prediction of drug–target interaction networks from the integration of chemical and genomic spaces , 2008, ISMB.

[23]  Philip H. S. Torr,et al.  An embarrassingly simple approach to zero-shot learning , 2015, ICML.

[24]  Bernard De Baets,et al.  Efficient regularized least-squares algorithms for conditional ranking on relational data , 2012, Machine Learning.

[25]  Edwin V. Bonilla,et al.  Kernel Multi-task Learning using Task-specific Features , 2007, AISTATS.

[26]  Lorenzo Rosasco,et al.  Multi-output learning via spectral filtering , 2012, Machine Learning.

[27]  Kristen Grauman,et al.  Zero-shot recognition with unreliable attributes , 2014, NIPS.

[28]  Charles A. Micchelli,et al.  On Spectral Learning , 2010, J. Mach. Learn. Res..

[29]  Tapio Salakoski,et al.  A Kernel-Based Framework for Learning Graded Relations From Data , 2011, IEEE Transactions on Fuzzy Systems.

[30]  Geoffrey E. Hinton,et al.  Zero-shot Learning with Semantic Output Codes , 2009, NIPS.

[31]  E. Marcotte,et al.  A flaw in the typical evaluation scheme for pair-input computational predictions , 2012, Nature Methods.

[32]  Mindy I. Davis,et al.  Comprehensive analysis of kinase inhibitor selectivity , 2011, Nature Biotechnology.

[33]  Shinichi Nakagawa,et al.  A Tale of Two Phylogenies: Comparative Analyses of Ecological Interactions , 2013, The American Naturalist.

[34]  Carla D. Moravitz Martin,et al.  Shifted Kronecker Product Systems , 2006, SIAM J. Matrix Anal. Appl..

[35]  Christoph H. Lampert,et al.  Attribute-Based Classification for Zero-Shot Visual Object Categorization , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[36]  Yoshihiro Yamanishi,et al.  propagation: A fast semisupervised learning algorithm for link prediction , 2009 .

[37]  Guillermo Sapiro,et al.  Kernelized Probabilistic Matrix Factorization: Exploiting Graphs and Side Information , 2012, SDM.

[38]  Tapio Pahikkala,et al.  Toward more realistic drug^target interaction predictions , 2014 .

[39]  Hisashi Kashima,et al.  Self-measuring Similarity for Multi-task Gaussian Process , 2011, ICML Unsupervised and Transfer Learning.

[40]  Tapio Salakoski,et al.  Fast n-Fold Cross-Validation for Regularized Least-Squares , 2006 .

[41]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[42]  Bernard De Baets,et al.  Data-driven recipe completion using machine learning methods , 2016 .

[43]  Jean-Philippe Vert,et al.  Protein-ligand interaction prediction: an improved chemogenomics approach , 2008, Bioinform..

[44]  Ryan M. Rifkin,et al.  Value Regularization and Fenchel Duality , 2007, J. Mach. Learn. Res..

[45]  Robert Tibshirani,et al.  Spectral Regularization Algorithms for Learning Large Incomplete Matrices , 2010, J. Mach. Learn. Res..

[46]  Thomas Hofmann,et al.  A joint framework for collaborative and content filtering , 2004, SIGIR '04.

[47]  William Stafford Noble,et al.  Kernel methods for predicting protein-protein interactions , 2005, ISMB.

[48]  Francis R. Bach,et al.  A New Approach to Collaborative Filtering: Operator Estimation with Spectral Regularization , 2008, J. Mach. Learn. Res..

[49]  M. Gonen,et al.  Concordance probability and discriminatory power in proportional hazards regression , 2005 .

[50]  Eyke Hüllermeier,et al.  On label dependence and loss minimization in multi-label classification , 2012, Machine Learning.

[51]  Ryan P. Adams,et al.  Incorporating Side Information in Probabilistic Matrix Factorization with Gaussian Processes , 2010, UAI.

[52]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[53]  Bernt Schiele,et al.  Evaluating knowledge transfer and zero-shot learning in a large-scale setting , 2011, CVPR 2011.

[54]  Bernard De Baets,et al.  A Two-Step Learning Approach for Solving Full and Almost Full Cold Start Problems in Dyadic Prediction , 2014, ECML/PKDD.

[55]  Yoshua Bengio,et al.  Zero-data Learning of New Tasks , 2008, AAAI.

[56]  Charles Elkan,et al.  A Log-Linear Model with Latent Features for Dyadic Prediction , 2010, 2010 IEEE International Conference on Data Mining.

[57]  Pierre Geurts,et al.  Classifying pairs with trees for supervised biological network inference† †Electronic supplementary information (ESI) available: Implementation and computational issues, supplementary performance curves, and illustration of interpretability of trees. See DOI: 10.1039/c5mb00174a Click here for additi , 2014, Molecular bioSystems.

[58]  Wei Chu,et al.  Information Services]: Web-based services , 2022 .

[59]  Andreas Fischer,et al.  Pairwise support vector machines and their application to large scale problems , 2012, J. Mach. Learn. Res..

[60]  C. Loan The ubiquitous Kronecker product , 2000 .

[61]  Lorenzo Rosasco,et al.  On regularization algorithms in learning theory , 2007, J. Complex..

[62]  Hisashi Kashima,et al.  Fast and Scalable Algorithms for Semi-supervised Link Prediction on Static and Dynamic Graphs , 2010, ECML/PKDD.

[63]  Grace Wahba,et al.  Spline Models for Observational Data , 1990 .

[64]  Christopher D. Manning,et al.  Using Feature Conjunctions across Examples for Learning Pairwise Classifiers , 2005 .

[65]  Tong Zhang,et al.  Graph-Based Semi-Supervised Learning and Spectral Kernel Design , 2008, IEEE Transactions on Information Theory.

[66]  A. Ives,et al.  Phylogenetic trait-based analyses of ecological networks. , 2013, Ecology.