Empirical Inference

This short contribution presents the first paper in which Vapnik and Chervonenkis describe the foundations of Statistical Learning Theory (Vapnik, Chervonenkis (1968) Proc USSR Acad Sci 181(4): 781–783). This short contribution presents the first paper in which Vapnik and Chervonenkis describe the foundations of Statistical Learning Theory [10]. The original paper was published in the Doklady, the Proceedings of the USSR Academy of Sciences, in 1968. An English translation was published the same year in Soviet Mathematics, a journal from the American Mathematical Society publishing translations of the mathematical section of the Doklady.1 The importance of the work of Vapnik and Chervonenkis was noticed immediately. Dudley begins his 1969 review for Mathematical Reviews [3] with the simple sentence “the following very interesting results are announced.” This concise paper is historically more interesting than the celebrated 1971 paper [11] because the three-page limit has forced its authors to reveal what they consider essential. Every word in this paper counts. In particular, the introduction explains that a uniform law of large numbers “is necessary in the construction of learning algorithms.” The mention of learning algorithms in 1968 seems to be an anachronism. In fact, learning machines were a popular subject in the 1960s at the Institute of Automation and Remote Control in Moscow. The trend possibly started with the work of Aizerman and collaborators on pattern recognition [1] and the work 1A slightly modified version of this English translation of the 1968 paper follows this brief introduction. L. Bottou ( ) Microsoft Research, 1 Microsoft Way, Redmond, WA 98052, USA e-mail: leon@bottou.org B. Schölkopf et al. (eds.), Empirical Inference, DOI 10.1007/978-3-642-41136-6__1, © Springer-Verlag Berlin Heidelberg 2013 3

[1]  Gunnar Rätsch,et al.  Leveraging Sequence Classification by Taxonomy-Based Multitask Learning , 2010, RECOMB.

[2]  John C. Snyder,et al.  Orbital-free bond breaking via machine learning. , 2013, The Journal of chemical physics.

[3]  Jean-Philippe Vert,et al.  Efficient peptide-MHC-I binding prediction for alleles with few known binders , 2008, Bioinform..

[4]  Nello Cristianini,et al.  Learning the Kernel Matrix with Semidefinite Programming , 2002, J. Mach. Learn. Res..

[5]  Jonathan Baxter,et al.  A Model of Inductive Bias Learning , 2000, J. Artif. Intell. Res..

[6]  Rich Caruana,et al.  Multitask Learning: A Knowledge-Based Source of Inductive Bias , 1993, ICML.

[7]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[8]  Volker Tresp,et al.  Scaling Kernel-Based Systems to Large Data Sets , 2001, Data Mining and Knowledge Discovery.

[9]  Olga G. Troyanskaya,et al.  Simultaneous Genome-Wide Inference of Physical, Genetic, Regulatory, and Functional Pathway Components , 2010, PLoS Comput. Biol..

[10]  Masashi Sugiyama,et al.  Salient Object Detection Based on Direct Density-ratio Estimation , 2012 .

[11]  Jing Wang Improve local tangent space alignment using various dimensional local coordinates , 2008, Neurocomputing.

[12]  Yves Grandvalet,et al.  Y.: SimpleMKL , 2008 .

[13]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[14]  Gunnar Rätsch,et al.  Large Scale Multiple Kernel Learning , 2006, J. Mach. Learn. Res..

[15]  Sugiyama Masashi,et al.  Relative Density-Ratio Estimation for Robust Distribution Comparison , 2011 .

[16]  Jean-Philippe Vert,et al.  ProDiGe: Prioritization Of Disease Genes with multitask machine learning from positive and unlabeled examples , 2011, BMC Bioinformatics.

[17]  Hal Daumé,et al.  Frustratingly Easy Domain Adaptation , 2007, ACL.

[18]  T. Ben-David,et al.  Exploiting Task Relatedness for Multiple , 2003 .

[19]  Rich Caruana,et al.  Multitask Learning , 1997, Machine-mediated learning.

[20]  Gunnar Rätsch,et al.  ARTS: accurate recognition of transcription starts in human , 2006, ISMB.

[21]  Tong Zhang,et al.  A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data , 2005, J. Mach. Learn. Res..

[22]  David Heckerman,et al.  Leveraging Information Across HLA Alleles/Supertypes Improves Epitope Prediction , 2006, RECOMB.

[23]  Masashi Sugiyama,et al.  Detection of activities and events without explicit categorization , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[24]  Massimiliano Pontil,et al.  Multi-Task Feature Learning , 2006, NIPS.

[25]  Bernhard Schölkopf,et al.  Injective Hilbert Space Embeddings of Probability Measures , 2008, COLT.

[26]  Charles A. Micchelli,et al.  Learning Multiple Tasks with Kernel Methods , 2005, J. Mach. Learn. Res..

[27]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[28]  C. Zălinescu Convex analysis in general vector spaces , 2002 .

[29]  Michael E. Tipping The Relevance Vector Machine , 1999, NIPS.

[30]  R. Tibshirani,et al.  Sparsity and smoothness via the fused lasso , 2005 .

[31]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[32]  Gunnar Rätsch,et al.  Inferring latent task structure for Multitask Learning by Multiple Kernel Learning , 2010, BMC Bioinformatics.

[33]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[34]  Alexander Zien,et al.  lp-Norm Multiple Kernel Learning , 2011, J. Mach. Learn. Res..

[35]  Massimiliano Pontil,et al.  Regularized multi--task learning , 2004, KDD.

[36]  Gilles Blanchard,et al.  Generalizing from Several Related Classification Tasks to a New Unlabeled Sample , 2011, NIPS.

[37]  Masashi Sugiyama,et al.  Suffcient Component Analysis , 2011, ACML.

[38]  Gunnar Rätsch,et al.  An Empirical Analysis of Domain Adaptation Algorithms for Genomic Sequence Analysis , 2008, NIPS.

[39]  Taiji Suzuki,et al.  SpicyMKL: a fast algorithm for Multiple Kernel Learning with thousands of kernels , 2011, Machine Learning.

[40]  Paul Tseng,et al.  Approximation accuracy, gradient methods, and error bound for structured convex optimization , 2010, Math. Program..

[41]  Gunnar Rätsch,et al.  Multitask Learning in Computational Biology , 2012, ICML Unsupervised and Transfer Learning.