Magnitude-Preserving Ranking for Structured Outputs

In this paper, we present a novel method for solving structured prediction problems, based on combining Input Output Kernel Regression (IOKR) with an extension of magnitudepreserving ranking to structured output spaces. In particular, we concentrate on the case where a set of candidate outputs has been given, and the associated pre-image problem calls for ranking the set of candidate outputs. Our method, called magnitude-preserving IOKR, both aims to produce a good approximation of the output feature vectors, and to preserve the magnitude differences of the output features in the candidate sets. For the case where the candidate set does not contain corresponding ’correct’ inputs, we propose a method for approximating the inputs through application of IOKR in the reverse direction. We apply our method to two learning problems: cross-lingual document retrieval and metabolite identification. Experiments show that the proposed approach improves performance over IOKR, and in the latter application obtains the current state-of-the-art accuracy.

[1]  François Laviolette,et al.  Algorithms for the Hard Pre-Image Problem of String Kernels and the General Problem of String Prediction , 2015, ICML.

[2]  Florence d'Alché-Buc,et al.  Input Output Kernel Regression: Supervised and Semi-Supervised Structured Output Prediction with Operator-Valued Kernels , 2016, J. Mach. Learn. Res..

[3]  Florian Rasche,et al.  Towards de novo identification of metabolites by analyzing tandem mass spectra , 2008, ECCB.

[4]  Bernhard Schölkopf,et al.  Kernel Dependency Estimation , 2002, NIPS.

[5]  Gang Fu,et al.  PubChem Substance and Compound databases , 2015, Nucleic Acids Res..

[6]  Tomaz Erjavec,et al.  The JRC-Acquis: A Multilingual Aligned Parallel Corpus with 20+ Languages , 2006, LREC.

[7]  Florence d'Alché-Buc,et al.  Semi-supervised Penalized Output Kernel Regression for Link Prediction , 2011, ICML.

[8]  Jason Weston,et al.  A general regression technique for learning transductions , 2005, ICML '05.

[9]  Charles A. Micchelli,et al.  On Learning Vector-Valued Functions , 2005, Neural Computation.

[10]  T. Salakoski,et al.  Learning to Rank with Pairwise Regularized Least-Squares , 2007 .

[11]  S. Böcker,et al.  Searching molecular structure databases with tandem mass spectra using CSI:FingerID , 2015, Proceedings of the National Academy of Sciences of the United States of America.

[12]  Philippe Preux,et al.  A Generalized Kernel Approach to Structured Output Learning , 2013, ICML.

[13]  Mehryar Mohri,et al.  Magnitude-preserving ranking algorithms , 2007, ICML '07.

[14]  Juho Rousu,et al.  Metabolite identification through multiple kernel learning on fragmentation trees , 2014, Bioinform..

[15]  Pierre Geurts,et al.  Kernelizing the output of tree-based methods , 2006, ICML '06.

[16]  Mehryar Mohri,et al.  Algorithms for Learning Kernels Based on Centered Alignment , 2012, J. Mach. Learn. Res..

[17]  Juho Rousu,et al.  Fast metabolite identification with Input Output Kernel Regression , 2016, Bioinform..

[18]  Russ Greiner,et al.  Competitive fragmentation modeling of ESI-MS/MS spectra for putative metabolite identification , 2013, Metabolomics.

[19]  George Pedrick,et al.  Theory of reproducing kernels for Hilbert spaces of vector valued functions , 1957 .

[20]  G. Siuzdak,et al.  Innovation: Metabolomics: the apogee of the omics trilogy , 2012, Nature Reviews Molecular Cell Biology.

[21]  Kristian Fog Nielsen,et al.  Sharing and community curation of mass spectrometry data with Global Natural Products Social Molecular Networking , 2016, Nature Biotechnology.