Probability and Distributions

[1]  Yurii Nesterov,et al.  Lectures on Convex Optimization , 2018 .

[2]  Stephen P. Boyd,et al.  Introduction to Applied Linear Algebra , 2018 .

[3]  Jorge Nocedal,et al.  Optimization Methods for Large-Scale Machine Learning , 2016, SIAM Rev..

[4]  Paul R. Rosenbaum,et al.  Observation and Experiment: An Introduction to Causal Inference , 2017 .

[5]  Gabriel Goh,et al.  Why Momentum Really Works , 2017 .

[6]  Barak A. Pearlmutter,et al.  Automatic differentiation in machine learning: a survey , 2015, J. Mach. Learn. Res..

[7]  Elad Hazan,et al.  Introduction to Online Convex Optimization , 2016, Found. Trends Optim..

[8]  Cheng Soon Ong,et al.  Learning SVM in Kreĭn Spaces , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  R. Polyak The Legendre Transformation in Modern Optimization , 2016 .

[10]  Shun-ichi Amari,et al.  Information Geometry and Its Applications , 2016 .

[11]  David M. Blei,et al.  Variational Inference: A Review for Statisticians , 2016, ArXiv.

[12]  Nando de Freitas,et al.  Taking the Human Out of the Loop: A Review of Bayesian Optimization , 2016, Proceedings of the IEEE.

[13]  Zoubin Ghahramani,et al.  Probabilistic machine learning and artificial intelligence , 2015, Nature.

[14]  D. J. Bartholomew,et al.  Latent‐Variable Modeling , 2015 .

[15]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[16]  Carl E. Rasmussen,et al.  Gaussian Processes for Data-Efficient Learning in Robotics and Control , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Pierre-Olivier Amblard,et al.  A Primer on Reproducing Kernel Hilbert Spaces , 2014, Found. Trends Signal Process..

[18]  J. Cunningham,et al.  Linear dimensionality reduction: survey, insights, and generalizations , 2014, J. Mach. Learn. Res..

[19]  Sébastien Bubeck,et al.  Convex Optimization: Algorithms and Complexity , 2014, Found. Trends Mach. Learn..

[20]  Sebastian Nowozin,et al.  Advanced Structured Prediction , 2014 .

[21]  Steven H. Strogatz,et al.  Writing about Math for the perplexed and traumatized , 2014 .

[22]  David L. Donoho,et al.  The Optimal Hard Threshold for Singular Values is 4/sqrt(3) , 2013, 1305.5870.

[23]  Leslie Hogben,et al.  Handbook of Linear Algebra : Handbook of Linear Algebra , 2013 .

[24]  Gábor Lugosi,et al.  Concentration Inequalities - A Nonasymptotic Theory of Independence , 2013, Concentration Inequalities.

[25]  Chong Wang,et al.  Stochastic variational inference , 2012, J. Mach. Learn. Res..

[26]  Linglong Kong,et al.  Quantile tomography: using quantiles with multivariate data , 2008, Statistica Sinica.

[27]  Shiliang Sun,et al.  A review of optimization methodologies in support vector machines , 2011, Neurocomputing.

[28]  Andrew Gelman,et al.  Handbook of Markov Chain Monte Carlo , 2011 .

[29]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[30]  E. Çinlar Probability and Stochastics , 2011 .

[31]  Mark D. Reid,et al.  Information, Divergence and Risk for Binary Experiments , 2009, J. Mach. Learn. Res..

[32]  Alan Agresti,et al.  Categorical Data Analysis , 2003 .

[33]  Sara van de Geer,et al.  Statistics for High-Dimensional Data , 2011 .

[34]  Philipp Birken,et al.  Numerical Linear Algebra , 2011, Encyclopedia of Parallel Computing.

[35]  M. Deisenroth,et al.  A general perspective on Gaussian filtering and smoothing: Explaining current and deriving new algorithms , 2011, Proceedings of the 2011 American Control Conference.

[36]  Christopher J. C. Burges,et al.  Dimension Reduction: A Guided Tour , 2010, Found. Trends Mach. Learn..

[37]  Hal Daumé,et al.  A geometric view of conjugate priors , 2010, Machine Learning.

[38]  D. Paindaveine,et al.  Multivariate quantiles and multiple-output regression quantiles: from L1 optimization to halfspace depth , 2010, 1002.4486.

[39]  Bernhard Schölkopf,et al.  Hilbert Space Embeddings and Metrics on Probability Measures , 2009, J. Mach. Learn. Res..

[40]  Conal Elliott Beautiful differentiation , 2009, ICFP.

[41]  Tamara G. Kolda,et al.  Tensor Decompositions and Applications , 2009, SIAM Rev..

[42]  Yuesheng Xu,et al.  Reproducing kernel Banach spaces for machine learning , 2009, 2009 International Joint Conference on Neural Networks.

[43]  Aarnout Brombacher,et al.  Probability... , 2009, Qual. Reliab. Eng. Int..

[44]  Seungjin Choi,et al.  Independent Component Analysis , 2009, Handbook of Natural Computing.

[45]  Jim Hefferon,et al.  Linear Algebra , 2012 .

[46]  Gunnar Rätsch,et al.  Support Vector Machines and Kernels for Computational Biology , 2008, PLoS Comput. Biol..

[47]  Paul H. Siegel,et al.  Gaussian belief propagation solver for systems of linear equations , 2008, 2008 IEEE International Symposium on Information Theory.

[48]  R. Zia,et al.  Making sense of the Legendre transform , 2008, 0806.1147.

[49]  Richard Szeliski,et al.  A Comparative Study of Energy Minimization Methods for Markov Random Fields with Smoothness-Based Priors , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[50]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[51]  Lawrence M Leemis,et al.  Univariate Distribution Relationships , 2008 .

[52]  B. Schölkopf,et al.  Kernel methods in machine learning , 2007, math/0701907.

[53]  S. V. N. Vishwanathan,et al.  Graph kernels , 2007 .

[54]  Hsuan-Tien Lin,et al.  A note on Platt’s probabilistic outputs for support vector machines , 2007, Machine Learning.

[55]  Gökhan BakIr,et al.  Predicting Structured Data , 2008 .

[56]  Ryan M. Rifkin,et al.  Value Regularization and Fenchel Duality , 2007, J. Mach. Learn. Res..

[57]  Ingo Steinwart How to Compare Different Loss Functions and Their Risks , 2007 .

[58]  A. Raftery,et al.  Strictly Proper Scoring Rules, Prediction, and Estimation , 2007 .

[59]  M. Kenward,et al.  An Introduction to the Bootstrap , 2007 .

[60]  Marcus Kaiser,et al.  Nonoptimal Component Placement, but Short Processing Paths, due to Long-Distance Projections in Neural Systems , 2006, PLoS Comput. Biol..

[61]  Antonio Criminisi,et al.  TextonBoost: Joint Appearance, Shape and Context Modeling for Multi-class Object Recognition and Segmentation , 2006, ECCV.

[62]  Neil D. Lawrence,et al.  Probabilistic Non-linear Principal Component Analysis with Gaussian Process Latent Variable Models , 2005, J. Mach. Learn. Res..

[63]  Larry Wasserman,et al.  All of Statistics , 2004 .

[64]  Alexander J. Smola,et al.  Learning with non-positive kernels , 2004, ICML.

[65]  J. Hiriart-Urruty,et al.  Fundamentals of Convex Analysis , 2004 .

[66]  Tomaso A. Poggio,et al.  Statistical Learning Theory: A Primer , 2000, International Journal of Computer Vision.

[67]  Michael I. Jordan,et al.  An Introduction to Variational Methods for Graphical Models , 1999, Machine Learning.

[68]  A. Berlinet,et al.  Reproducing kernel Hilbert spaces in probability and statistics , 2004 .

[69]  Mikhail Belkin,et al.  Laplacian Eigenmaps for Dimensionality Reduction and Data Representation , 2003, Neural Computation.

[70]  Marc Teboulle,et al.  Mirror descent and nonlinear projected subgradient methods for convex optimization , 2003, Oper. Res. Lett..

[71]  D. Donoho,et al.  Hessian eigenmaps: Locally linear embedding techniques for high-dimensional data , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[72]  Andreas Griewank,et al.  Introduction to Automatic Differentiation , 2003 .

[73]  Thomas Gärtner,et al.  Kernels for structured data , 2008, Series in Machine Perception and Artificial Intelligence.

[74]  L. Shepp Probability Essentials , 2002 .

[75]  Inder K. Rana Fundamental theorem of calculus for the Lebesgue integral , 2002 .

[76]  J. Stoer,et al.  Introduction to Numerical Analysis , 2002 .

[77]  Eric R. Ziegel,et al.  Generalized Linear Models , 2002, Technometrics.

[78]  Bernhard Schölkopf,et al.  A Generalized Representer Theorem , 2001, COLT/EuroCOLT.

[79]  Stan Lipovetsky,et al.  Latent Variable Models and Factor Analysis , 2001, Technometrics.

[80]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[81]  Adrian S. Lewis,et al.  Convex Analysis And Nonlinear Optimization , 2000 .

[82]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[83]  Andreas Griewank,et al.  Evaluating derivatives - principles and techniques of algorithmic differentiation, Second Edition , 2000, Frontiers in applied mathematics.

[84]  Vladimir Vapnik,et al.  An overview of statistical learning theory , 1999, IEEE Trans. Neural Networks.

[85]  Zoubin Ghahramani,et al.  A Unifying Review of Linear Gaussian Models , 1999, Neural Computation.

[86]  Michael E. Tipping,et al.  Probabilistic Principal Component Analysis , 1999 .

[87]  Arnold Neumaier,et al.  Solving Ill-Conditioned and Singular Linear Systems: A Tutorial on Regularization , 1998, SIAM Rev..

[88]  B. Presnell,et al.  Expect the unexpected from conditional expectation , 1998 .

[89]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[90]  Jung-Fu Cheng,et al.  Turbo Decoding as an Instance of Pearl's "Belief Propagation" Algorithm , 1998, IEEE J. Sel. Areas Commun..

[91]  Bernhard Schölkopf,et al.  Kernel Principal Component Analysis , 1997, ICANN.

[92]  Jeffrey K. Uhlmann,et al.  New extension of the Kalman filter to nonlinear systems , 1997, Defense, Security, and Sensing.

[93]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[94]  E. Mammen The Bootstrap and Edgeworth Expansion , 1997 .

[95]  D. Kalman A Singularly Valuable Decomposition: The SVD of a Matrix , 1996 .

[96]  B. Datta Numerical Linear Algebra and Applications , 1995 .

[97]  Geoffrey E. Hinton,et al.  Bayesian Learning for Neural Networks , 1995 .

[98]  Duncan Fyfe Gillies,et al.  Probabilistic reasoning in high-level vision , 1994, Image Vis. Comput..

[99]  G. Strang The Fundamental Theorem of Linear Algebra , 1993 .

[100]  David J. C. MacKay,et al.  Bayesian Interpolation , 1992, Neural Computation.

[101]  James O. Berger,et al.  Ockham's Razor and Bayesian Analysis , 1992 .

[102]  Peter Whittle,et al.  Probability via expectation , 1992 .

[103]  A. O'Hagan,et al.  Bayes–Hermite quadrature , 1991 .

[104]  G. Wahba Spline Models for Observational Data , 1990 .

[105]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[106]  L. Devroye Non-Uniform Random Variate Generation , 1986 .

[107]  Naum Zuselevich Shor,et al.  Minimization Methods for Non-Differentiable Functions , 1985, Springer Series in Computational Mathematics.

[108]  Josef Kittler,et al.  Contextual classification of multispectral pixel data , 1984, Image Vis. Comput..

[109]  Reuven Y. Rubinstein,et al.  Simulation and the Monte Carlo method , 1981, Wiley series in probability and mathematical statistics.

[110]  D. Spiegelhalter,et al.  Bayes Factors and Choice Criteria for Linear Models , 1980 .

[111]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[112]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[113]  H. Akaike A new look at the statistical model identification , 1974 .

[114]  J. Chang,et al.  Analysis of individual differences in multidimensional scaling via an n-way generalization of “Eckart-Young” decomposition , 1970 .

[115]  G. Wahba,et al.  A Correspondence Between Bayesian Estimation on Stochastic Processes and Smoothing by Splines , 1970 .

[116]  O. Barndorff-Nielsen Information And Exponential Families , 1970 .

[117]  L. Tucker,et al.  Some mathematical notes on three-mode factor analysis , 1966, Psychometrika.

[118]  Laurent Schwartz,et al.  Sous-espaces hilbertiens d’espaces vectoriels topologiques et noyaux associés (Noyaux reproduisants) , 1964 .

[119]  E. Parzen On Estimation of a Probability Density Function and Mode , 1962 .

[120]  S. Dreyfus The numerical solution of variational problems , 1962 .

[121]  Henry J. Kelley,et al.  Gradient Theory of Optimal Flight Paths , 1960 .

[122]  R. E. Kalman,et al.  A New Approach to Linear Filtering and Prediction Problems , 2002 .

[123]  M. Rosenblatt Remarks on Some Nonparametric Estimates of a Density Function , 1956 .

[124]  N. Aronszajn Theory of Reproducing Kernels. , 1950 .

[125]  C. Eckart,et al.  The approximation of one matrix by another of lower rank , 1936 .

[126]  H. Hotelling Analysis of a complex of statistical variables into principal components. , 1933 .

[127]  C. Spearman General intelligence Objectively Determined and Measured , 1904 .

[128]  Karl Pearson F.R.S. LIII. On lines and planes of closest fit to systems of points in space , 1901 .

[129]  K. Pearson Contributions to the Mathematical Theory of Evolution. II. Skew Variation in Homogeneous Material , 1895 .

[130]  Benjamin S. Baumer,et al.  Tidy data , 2022, Modern Data Science with R.