Understanding Machine Learning: From Theory to Algorithms
暂无分享,去创建一个
Shai Ben-David | Shai Shalev-Shwartz | S. Shalev-Shwartz | S. Ben-David | Shai Ben-David | Shai Shalev-Shwartz
[1] Karl Pearson F.R.S.. LIII. On lines and planes of closest fit to systems of points in space , 1901 .
[2] J. Hadamard. Sur les problemes aux derive espartielles et leur signification physique , 1902 .
[3] R. Fisher,et al. On the Mathematical Foundations of Theoretical Statistics , 1922 .
[4] J. Neumann. Zur Theorie der Gesellschaftsspiele , 1928 .
[5] A. Tikhonov. On the stability of inverse problems , 1943 .
[6] de Ng Dick Bruijn. A combinatorial problem , 1946 .
[7] John von Neumann,et al. 1. A Certain Zero-sum Two-person Game Equivalent to the Optimal Assignment Problem , 1953 .
[8] S. Agmon. The Relaxation Method for Linear Inequalities , 1954, Canadian Journal of Mathematics.
[9] H. Kuhn. The Hungarian method for the assignment problem , 1955 .
[10] Philip Wolfe,et al. An algorithm for quadratic programming , 1956 .
[11] F ROSENBLATT,et al. The perceptron: a probabilistic model for information storage and organization in the brain. , 1958, Psychological review.
[12] David L. Phillips,et al. A Technique for the Numerical Solution of Certain Integral Equations of the First Kind , 1962, JACM.
[13] Albert B Novikoff,et al. ON CONVERGENCE PROOFS FOR PERCEPTRONS , 1963 .
[14] M. Aizerman,et al. Theoretical Foundations of the Potential Function Method in Pattern Recognition Learning , 1964 .
[15] Thomas M. Cover,et al. Behavior of sequential predictors of binary sequences , 1965 .
[16] John Garcia,et al. Relation of cue to consequence in avoidance learning , 1966 .
[17] Peter E. Hart,et al. Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.
[18] Marvin Minsky,et al. Perceptrons: An Introduction to Computational Geometry , 1969 .
[19] Vladimir Vapnik,et al. Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .
[20] Norbert Sauer,et al. On the Density of Families of Sets , 1972, J. Comb. Theory, Ser. A.
[21] Richard M. Karp,et al. Reducibility Among Combinatorial Problems , 1972, 50 Years of Integer Programming.
[22] S. Shelah. A combinatorial problem; stability and order for models and theories in infinitary languages. , 1972 .
[23] Ronald L. Rivest,et al. Constructing Optimal Binary Decision Trees is NP-Complete , 1976, Inf. Process. Lett..
[24] C. J. Stone,et al. Consistent Nonparametric Regression , 1977 .
[25] E. Slud. Distribution Inequalities for the Binomial Law , 1977 .
[26] W. Rogers,et al. A Finite Sample Distribution-Free Performance Bound for Local Discrimination Rules , 1978 .
[27] J. Rissanen,et al. Modeling By Shortest Data Description* , 1978, Autom..
[28] G. Pisier. Remarques sur un résultat non publié de B. Maurey , 1981 .
[29] John Darzentas,et al. Problem Complexity and Method Efficiency in Optimization , 1983 .
[30] J. Rissanen. A UNIVERSAL PRIOR FOR INTEGERS AND ESTIMATION BY MINIMUM DESCRIPTION LENGTH , 1983 .
[31] Leslie G. Valiant,et al. A theory of the learnable , 1984, STOC '84.
[32] Luc Devroye,et al. Nonparametric Density Estimation , 1985 .
[33] Geoffrey E. Hinton,et al. Learning internal representations by error propagation , 1986 .
[34] L. Rabiner,et al. An introduction to hidden Markov models , 1986, IEEE ASSP Magazine.
[35] David Haussler,et al. Occam's Razor , 1987, Inf. Process. Lett..
[36] N. Littlestone. Learning Quickly When Irrelevant Attributes Abound: A New Linear-Threshold Algorithm , 1987, 28th Annual Symposium on Foundations of Computer Science (sfcs 1987).
[37] R. Dudley. Universal Donsker Classes and Metric Entropy , 1987 .
[38] Leslie G. Valiant,et al. Computational limitations on learning from examples , 1988, JACM.
[39] S. Smale,et al. On a theory of computation and complexity over the real numbers; np-completeness , 1989 .
[40] Sally Floyd,et al. Space-bounded learning and the Vapnik-Chervonenkis dimension , 1989, COLT '89.
[41] David Haussler,et al. Learnability and the Vapnik-Chervonenkis dimension , 1989, JACM.
[42] Vladimir Vovk,et al. Aggregating strategies , 1990, COLT '90.
[43] Vladimir Vapnik,et al. Principles of Risk Minimization for Learning Theory , 1991, NIPS.
[44] R. Dudley,et al. Uniform and universal Glivenko-Cantelli classes , 1991 .
[45] Bernhard E. Boser,et al. A training algorithm for optimal margin classifiers , 1992, COLT '92.
[46] J. Ross Quinlan,et al. C4.5: Programs for Machine Learning , 1992 .
[47] Linda Sellie,et al. Toward efficient agnostic learning , 1992, COLT '92.
[48] David Haussler,et al. Decision Theoretic Generalizations of the PAC Model for Neural Net and Other Learning Applications , 1992, Inf. Comput..
[49] J. Hiriart-Urruty,et al. Convex analysis and minimization algorithms , 1993 .
[50] Stéphane Mallat,et al. Matching pursuits with time-frequency dictionaries , 1993, IEEE Trans. Signal Process..
[51] Jayaram K. Sankaran. A note on resolving infeasibility in linear programs by constraint relaxation , 1993, Oper. Res. Lett..
[52] Ian Parberry,et al. Circuit complexity and neural networks , 1994 .
[53] Manfred K. Warmuth,et al. The Weighted Majority Algorithm , 1994, Inf. Comput..
[54] Umesh V. Vazirani,et al. An Introduction to Computational Learning Theory , 1994 .
[55] Philip M. Long,et al. Fat-shattering and the learnability of real-valued functions , 1994, COLT '94.
[56] Dimitri P. Bertsekas,et al. Nonlinear Programming , 1997 .
[57] Philip M. Long,et al. A Generalization of Sauer's Lemma , 1995, J. Comb. Theory, Ser. A.
[58] Vladimir Vapnik,et al. The Nature of Statistical Learning , 1995 .
[59] Philip M. Long,et al. Characterizations of Learnability for Classes of {0, ..., n}-Valued Functions , 1995, J. Comput. Syst. Sci..
[60] Yoav Freund,et al. A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.
[61] Thomas G. Dietterich,et al. Solving Multiclass Learning Problems via Error-Correcting Output Codes , 1994, J. Artif. Intell. Res..
[62] Balas K. Natarajan,et al. Sparse Approximate Solutions to Linear Systems , 1995, SIAM J. Comput..
[63] Yishay Mansour,et al. On the boosting ability of top-down decision tree learning algorithms , 1996, STOC '96.
[64] Michael Sipser,et al. Introduction to the Theory of Computation , 1996, SIGA.
[65] L. Houck,et al. Foundations of Animal Behavior: Classic Papers with Commentaries , 1996 .
[66] László Györfi,et al. A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.
[67] Philip M. Long,et al. Fat-shattering and the learnability of real-valued functions , 1994, COLT '94.
[68] R. Tibshirani. Regression Shrinkage and Selection via the Lasso , 1996 .
[69] Leo Breiman,et al. Bias, Variance , And Arcing Classifiers , 1996 .
[70] S. Mallat,et al. Adaptive greedy approximations , 1997 .
[71] Yoav Freund,et al. A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.
[72] David H. Wolpert,et al. No free lunch theorems for optimization , 1997, IEEE Trans. Evol. Comput..
[73] Noga Alon,et al. Scale-sensitive dimensions, uniform convergence, and learnability , 1997, JACM.
[74] Vladimir Cherkassky,et al. The Nature Of Statistical Learning Theory , 1997, IEEE Trans. Neural Networks.
[75] David A. McAllester. Some PAC-Bayesian Theorems , 1998, COLT' 98.
[76] Vladimir Vapnik,et al. Statistical learning theory , 1998 .
[77] Bernhard Schölkopf,et al. Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.
[78] Yoshua Bengio,et al. Convolutional networks for images, speech, and time series , 1998 .
[79] Yoav Freund,et al. Large Margin Classification Using the Perceptron Algorithm , 1998, COLT.
[80] S. Ben-David,et al. Combinatorial Variability of Vapnik-chervonenkis Classes with Applications to Sample Compression Schemes , 1998, Discrete Applied Mathematics.
[81] Noboru Murata,et al. A Statistical Study on On-line Learning , 1999 .
[82] Jason Weston,et al. Support vector machines for multi-class pattern recognition , 1999, ESANN.
[83] David A. McAllester. PAC-Bayesian model averaging , 1999, COLT '99.
[84] Peter L. Bartlett,et al. Neural Network Learning - Theoretical Foundations , 1999 .
[85] Geoffrey J. Gordon. Regret bounds for prediction problems , 1999, COLT '99.
[86] M. Kearns,et al. Algorithmic stability and sanity-check bounds for leave-one-out cross-validation , 1999 .
[87] Nello Cristianini,et al. An introduction to Support Vector Machines , 2000 .
[88] Adrian S. Lewis,et al. Convex Analysis And Nonlinear Optimization , 2000 .
[89] Hans Ulrich Simon,et al. Efficient Learning of Linear Perceptrons , 2000, NIPS.
[90] V. Koltchinskii,et al. Rademacher Processes and Bounding the Risk of Function Learning , 2004, math/0405338.
[91] Vladimir N. Vapnik,et al. The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.
[92] Naftali Tishby,et al. The information bottleneck method , 2000, ArXiv.
[93] Yoram Singer,et al. Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers , 2000, J. Mach. Learn. Res..
[94] Peter L. Bartlett,et al. Rademacher and Gaussian Complexities: Risk Bounds and Structural Results , 2003, J. Mach. Learn. Res..
[95] Bernhard Schölkopf,et al. A Generalized Representer Theorem , 2001, COLT/EuroCOLT.
[96] Trevor Hastie,et al. The Elements of Statistical Learning , 2001 .
[97] David J. Kriegman,et al. From Few to Many: Illumination Cone Models for Face Recognition under Variable Lighting and Pose , 2001, IEEE Trans. Pattern Anal. Mach. Intell..
[98] Alexander J. Smola,et al. Learning with Kernels: support vector machines, regularization, optimization, and beyond , 2001, Adaptive computation and machine learning series.
[99] Koby Crammer,et al. On the Algorithmic Implementation of Multiclass Kernel-based Vector Machines , 2002, J. Mach. Learn. Res..
[100] Andrew McCallum,et al. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.
[101] Bernhard Schölkopf,et al. Kernel Dependency Estimation , 2002, NIPS.
[102] Matthias W. Seeger,et al. PAC-Bayesian Generalisation Error Bounds for Gaussian Process Classification , 2003, J. Mach. Learn. Res..
[103] Partha Niyogi,et al. Almost-everywhere Algorithmic Stability and Generalization Error , 2002, UAI.
[104] André Elisseeff,et al. Stability and Generalization , 2002, J. Mach. Learn. Res..
[105] John Shawe-Taylor,et al. PAC-Bayes & Margins , 2002, NIPS.
[106] Michael Collins,et al. Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms , 2002, EMNLP.
[107] Jon M. Kleinberg,et al. An Impossibility Theorem for Clustering , 2002, NIPS.
[108] O. Bousquet. Concentration Inequalities and Empirical Processes Theory Applied to the Analysis of Learning Algorithms , 2002 .
[109] P. Bartlett,et al. Hardness results for neural network approximation problems , 1999, Theor. Comput. Sci..
[110] Ben Taskar,et al. Max-Margin Markov Networks , 2003, NIPS.
[111] Manfred K. Warmuth,et al. Relating Data Compression and Learnability , 2003 .
[112] Yann LeCun,et al. Large Scale Online Learning , 2003, NIPS.
[113] David A. McAllester. Simplified PAC-Bayesian Margin Bounds , 2003, COLT.
[114] Isabelle Guyon,et al. An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..
[115] Martin Zinkevich,et al. Online Convex Programming and Generalized Infinitesimal Gradient Ascent , 2003, ICML.
[116] Shai Ben-David,et al. On the difficulty of approximately maximizing agreements , 2000, J. Comput. Syst. Sci..
[117] Manfred K. Warmuth,et al. Sample Compression, Learnability, and the Vapnik-Chervonenkis Dimension , 1995, Machine Learning.
[118] Tong Zhang,et al. Solving large scale linear prediction problems using stochastic gradient descent algorithms , 2004, ICML.
[119] Claudio Gentile,et al. The Robustness of the p-Norm Algorithms , 2003, Machine Learning.
[120] Leo Breiman,et al. Random Forests , 2001, Machine Learning.
[121] Corinna Cortes,et al. Support-Vector Networks , 1995, Machine Learning.
[122] Thomas Hofmann,et al. Support vector machine learning for interdependent and structured output spaces , 2004, ICML.
[123] Carla E. Brodley,et al. Proceedings of the twenty-first international conference on Machine learning , 2004, International Conference on Machine Learning.
[124] J. Ross Quinlan,et al. Induction of Decision Trees , 1986, Machine Learning.
[125] Yurii Nesterov,et al. Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.
[126] O. Bousquet. THEORY OF CLASSIFICATION: A SURVEY OF RECENT ADVANCES , 2004 .
[127] David J. C. MacKay,et al. Information Theory, Inference, and Learning Algorithms , 2004, IEEE Transactions on Information Theory.
[128] Trevor Darrell,et al. Nearest-Neighbor Methods in Learning and Vision: Theory and Practice (Neural Information Processing) , 2006 .
[129] Yuhong Yang,et al. Information Theory, Inference, and Learning Algorithms , 2005 .
[130] R. Schapire. The Strength of Weak Learnability , 1990, Machine Learning.
[131] Michael Collins,et al. Discriminative Reranking for Natural Language Parsing , 2000, CL.
[132] P. Bartlett,et al. Local Rademacher complexities , 2005, math/0508275.
[133] Emmanuel J. Candès,et al. Decoding by linear programming , 2005, IEEE Transactions on Information Theory.
[134] B. K. Natarajan. On Learning Sets and Functions , 1989, Machine Learning.
[135] J. Langford. Tutorial on Practical Prediction Theory for Classification , 2005, J. Mach. Learn. Res..
[136] S. Boucheron,et al. Theory of classification : a survey of some recent advances , 2005 .
[137] T. Poggio,et al. STABILITY RESULTS IN LEARNING THEORY , 2005 .
[138] Dan Roth,et al. Learnability of Bipartite Ranking Functions , 2005, COLT.
[139] Thorsten Joachims,et al. A support vector method for multivariate performance measures , 2005, ICML.
[140] Sayan Mukherjee,et al. Learning theory: stability is sufficient for generalization and necessary and sufficient for consistency of empirical risk minimization , 2006, Adv. Comput. Math..
[141] V. Vapnik. Estimation of Dependences Based on Empirical Data , 2006 .
[142] Stephen P. Boyd,et al. Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.
[143] Yee Whye Teh,et al. A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.
[144] Gábor Lugosi,et al. Prediction, learning, and games , 2006 .
[145] Alexander A. Sherstov,et al. Cryptographic Hardness for Learning Intersections of Halfspaces , 2006, FOCS.
[146] V. Vapnik. Estimation of Dependences Based on Empirical Data , 2006 .
[147] Gunnar Rätsch,et al. Totally corrective boosting algorithms that maximize the margin , 2006, ICML.
[148] Peng Zhao,et al. On Model Selection Consistency of Lasso , 2006, J. Mach. Learn. Res..
[149] Alexander A. Sherstov,et al. Cryptographic Hardness for Learning Intersections of Halfspaces , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).
[150] O. Chapelle. Large margin optimization of ranking measures , 2007 .
[151] H. Robbins. A Stochastic Approximation Method , 1951 .
[152] Léon Bottou,et al. The Tradeoffs of Large Scale Learning , 2007, NIPS.
[153] Nasser M. Nasrabadi,et al. Pattern Recognition and Machine Learning , 2006, Technometrics.
[154] Lior Rokach,et al. Data Mining with Decision Trees - Theory and Applications , 2007, Series in Machine Perception and Artificial Intelligence.
[155] Yoshua Bengio,et al. Scaling learning algorithms towards AI , 2007 .
[156] Yoram Singer,et al. Pegasos: primal estimated sub-gradient solver for SVM , 2007, ICML '07.
[157] A. Beygelzimer. Multiclass Classification with Filter Trees , 2007 .
[158] Elad Hazan,et al. Logarithmic regret algorithms for online convex optimization , 2006, Machine Learning.
[159] Marc'Aurelio Ranzato,et al. Unsupervised Learning of Invariant Feature Hierarchies with Applications to Object Recognition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.
[160] Yoshua. Bengio,et al. Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..
[161] Ulrike von Luxburg,et al. A tutorial on spectral clustering , 2007, Stat. Comput..
[162] Shai Shalev-Shwartz,et al. Online learning: theory, algorithms and applications (למידה מקוונת.) , 2007 .
[163] Andreas Christmann,et al. Support vector machines , 2008, Data Mining and Knowledge Discovery Handbook.
[164] S. V. N. Vishwanathan,et al. Entropy Regularized LPBoost , 2008, ALT.
[165] Ambuj Tewari,et al. Optimal Stragies and Minimax Lower Bounds for Online Convex Games , 2008, COLT.
[166] Jason Weston,et al. A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.
[167] Chih-Jen Lin,et al. A Practical Guide to Support Vector Classication , 2008 .
[168] Nathan Srebro,et al. SVM optimization: inverse dependence on training set size , 2008, ICML '08.
[169] R. DeVore,et al. A Simple Proof of the Restricted Isometry Property for Random Matrices , 2008 .
[170] Ambuj Tewari,et al. On the Complexity of Linear Prediction: Risk Bounds, Margin Bounds, and Regularization , 2008, NIPS.
[171] E. Candès. The restricted isometry property and its implications for compressed sensing , 2008 .
[172] Shai Ben-David,et al. Measures of Clustering Quality: A Working Set of Axioms for Clustering , 2008, NIPS.
[173] William W. Cohen,et al. Proceedings of the 23rd international conference on Machine learning , 2006, ICML 2008.
[174] Shai Ben-David,et al. Agnostic Online Learning , 2009, COLT.
[175] Honglak Lee,et al. Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations , 2009, ICML '09.
[176] William T. Freeman,et al. Informative Sensing , 2009, ArXiv.
[177] Yurii Nesterov,et al. Primal-dual subgradient methods for convex problems , 2005, Math. Program..
[178] Alexander Shapiro,et al. Stochastic Approximation approach to Stochastic Programming , 2013 .
[179] Nir Friedman,et al. Probabilistic Graphical Models - Principles and Techniques , 2009 .
[180] Ohad Shamir,et al. Stochastic Convex Optimization , 2009, COLT.
[181] Alexander Shapiro,et al. Lectures on Stochastic Programming: Modeling and Theory , 2009 .
[182] Ohad Shamir,et al. Learning Kernel-Based Halfspaces with the Zero-One Loss , 2010, COLT 2010.
[183] Yoram Singer,et al. On the equivalence of weak learnability and linear separability: new relaxations and efficient boosting algorithms , 2010, Machine Learning.
[184] Tong Zhang,et al. Trading Accuracy for Sparsity in Optimization Problems with Sparsity Constraints , 2010, SIAM J. Optim..
[185] Ambuj Tewari,et al. Online Learning: Random Averages, Combinatorial Parameters, and Learnability , 2010, NIPS.
[186] Ohad Shamir,et al. Learnability, Stability and Uniform Convergence , 2010, J. Mach. Learn. Res..
[187] Wei-Yin Loh,et al. Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..
[188] Andrea Montanari,et al. The Noise-Sensitivity Phase Transition in Compressed Sensing , 2010, IEEE Transactions on Information Theory.
[189] Pedro M. Domingos,et al. Sum-product networks: A new deep architecture , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).
[190] Shai Ben-David,et al. Multiclass Learnability and the ERM principle , 2011, COLT.
[191] Kevin P. Murphy,et al. Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.
[192] David Barber,et al. Bayesian reasoning and machine learning , 2012 .
[193] Yoav Freund,et al. Boosting: Foundations and Algorithms , 2012 .
[194] Shai Shalev-Shwartz,et al. Online Learning and Online Convex Optimization , 2012, Found. Trends Mach. Learn..
[195] Amit Daniely,et al. Multiclass Learning Approaches: A Theoretical Comparison with Implications , 2012, NIPS.
[196] Ohad Shamir,et al. Making Gradient Descent Optimal for Strongly Convex Stochastic Optimization , 2011, ICML.
[197] Roi Livni,et al. Honest Compressions and Their Application to Compression Schemes , 2013, COLT.
[198] Ohad Shamir,et al. Stochastic Gradient Descent for Non-smooth Optimization: Convergence Results and Optimal Averaging Schemes , 2012, ICML.
[199] Roi Livni,et al. A Provably Efficient Algorithm for Training Deep Networks , 2013, ArXiv.
[200] Marc'Aurelio Ranzato,et al. Building high-level features using large scale unsupervised learning , 2011, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[201] D. L. Donoho,et al. Compressed sensing , 2006, IEEE Trans. Inf. Theory.
[202] Lee-Ad Gottlieb,et al. Efficient Classification for Metric Data , 2014, IEEE Trans. Inf. Theory.
[203] Andreas Holzinger,et al. Data Mining with Decision Trees: Theory and Applications , 2015, Online Inf. Rev..