Introduction to Statistical Machine Learning
暂无分享,去创建一个
[1] R. Fisher. THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .
[2] C. E. SHANNON,et al. A mathematical theory of communication , 1948, MOCO.
[3] N. Aronszajn. Theory of Reproducing Kernels. , 1950 .
[4] R. A. Leibler,et al. On Information and Sufficiency , 1951 .
[5] T. W. Anderson. An Introduction to Multivariate Statistical Analysis , 1959 .
[6] C. Quesenberry,et al. A nonparametric estimate of a multivariate density function , 1965 .
[7] S. M. Ali,et al. A General Class of Coefficients of Divergence of One Distribution from Another , 1966 .
[8] Shun-ichi Amari,et al. A Theory of Adaptive Pattern Classifiers , 1967, IEEE Trans. Electron. Comput..
[9] Donald Ervin Knuth,et al. The Art of Computer Programming , 1968 .
[10] W. K. Hastings,et al. Monte Carlo Sampling Methods Using Markov Chains and Their Applications , 1970 .
[11] A. E. Hoerl,et al. Ridge Regression: Applications to Nonorthogonal Problems , 1970 .
[12] H. Akaike. A new look at the statistical model identification , 1974 .
[13] M. Stone. An Asymptotic Equivalence of Choice of Model by Cross‐Validation and Akaike's Criterion , 1977 .
[14] P. Holland,et al. Robust regression using iteratively reweighted least-squares , 1977 .
[15] D. Rubin,et al. Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .
[16] J. Rissanen,et al. Modeling By Shortest Data Description* , 1978, Autom..
[17] New York Dover,et al. ON THE CONVERGENCE PROPERTIES OF THE EM ALGORITHM , 1983 .
[18] Donald Geman,et al. Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[19] Geoffrey E. Hinton,et al. Learning representations by back-propagating errors , 1986, Nature.
[20] B. Silverman. Density estimation for statistics and data analysis , 1986 .
[21] Adrian F. M. Smith,et al. Bayesian computation via the gibbs sampler and related markov chain monte carlo methods (with discus , 1993 .
[22] G. Wahba. Spline models for observational data , 1990 .
[23] Ker-Chau Li,et al. Sliced Inverse Regression for Dimension Reduction , 1991 .
[24] Bernhard E. Boser,et al. A training algorithm for optimal margin classifiers , 1992, COLT '92.
[25] D. W. Scott,et al. Multivariate Density Estimation, Theory, Practice and Visualization , 1992 .
[26] C. R. Rao,et al. Information and the Accuracy Attainable in the Estimation of Statistical Parameters , 1992 .
[27] Jun S. Liu,et al. The Collapsed Gibbs Sampler in Bayesian Computations with Applications to a Gene Regulation Problem , 1994 .
[28] Vasile Sima,et al. Algorithms for Linear-Quadratic Optimization , 2021 .
[29] G. Kitagawa,et al. Generalised information criteria in model selection , 1996 .
[30] Fan Chung,et al. Spectral Graph Theory , 1996 .
[31] Vladimir Vapnik,et al. Statistical learning theory , 1998 .
[32] Bernhard Schölkopf,et al. Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.
[33] Emile H. L. Aarts,et al. Boltzmann machines , 1998 .
[34] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .
[35] Shun-ichi Amari,et al. Methods of information geometry , 2000 .
[36] J. Friedman. Special Invited Paper-Additive logistic regression: A statistical view of boosting , 2000 .
[37] Osamu Watanabe,et al. MadaBoost: A Modification of AdaBoost , 2000, COLT.
[38] Hans-Peter Kriegel,et al. LOF: identifying density-based local outliers , 2000, SIGMOD 2000.
[39] Bernhard Schölkopf,et al. Learning with kernels , 2001 .
[40] Koby Crammer,et al. On the Algorithmic Implementation of Multiclass Kernel-based Vector Machines , 2002, J. Mach. Learn. Res..
[41] Andrew McCallum,et al. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.
[42] Peter Auer,et al. Using Confidence Bounds for Exploitation-Exploration Trade-offs , 2003, J. Mach. Learn. Res..
[43] I. Jolliffe. Principal Component Analysis , 2002 .
[44] Mark A. Girolami,et al. Mercer kernel-based clustering in feature space , 2002, IEEE Trans. Neural Networks.
[45] Geoffrey E. Hinton. Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.
[46] Koby Crammer,et al. Online Passive-Aggressive Algorithms , 2003, J. Mach. Learn. Res..
[47] Kari Torkkola,et al. Feature Extraction by Non-Parametric Mutual Information Maximization , 2003, J. Mach. Learn. Res..
[48] Xiaofei He,et al. Locality Preserving Projections , 2003, NIPS.
[49] Mikhail Belkin,et al. Laplacian Eigenmaps for Dimensionality Reduction and Data Representation , 2003, Neural Computation.
[50] Michael I. Jordan,et al. Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..
[51] Leo Breiman,et al. Random Forests , 2001, Machine Learning.
[52] Michael I. Jordan,et al. Multiple kernel learning, conic duality, and the SMO algorithm , 2004, ICML.
[53] Pietro Perona,et al. Self-Tuning Spectral Clustering , 2004, NIPS.
[54] Corinna Cortes,et al. Support-Vector Networks , 1995, Machine Learning.
[55] Robert P. W. Duin,et al. Support Vector Data Description , 2004, Machine Learning.
[56] Massimiliano Pontil,et al. Regularized multi--task learning , 2004, KDD.
[57] Mark Steyvers,et al. Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.
[58] David J. C. MacKay,et al. Information Theory, Inference, and Learning Algorithms , 2004, IEEE Transactions on Information Theory.
[59] Martin J. Wainwright,et al. ON surrogate loss functions and f-divergences , 2005, math/0510521.
[60] Leo Breiman,et al. Bagging Predictors , 1996, Machine Learning.
[61] R. Tibshirani,et al. Sparsity and smoothness via the fused lasso , 2005 .
[62] H. Zou,et al. Regularization and variable selection via the elastic net , 2005 .
[63] Thomas Hofmann,et al. Large Margin Methods for Structured and Interdependent Output Variables , 2005, J. Mach. Learn. Res..
[64] Geoffrey E. Hinton,et al. Reducing the Dimensionality of Data with Neural Networks , 2006, Science.
[65] Christopher M. Bishop,et al. Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .
[66] Kaare Brandt Petersen,et al. The Matrix Cookbook , 2006 .
[67] Bernhard Schölkopf,et al. A Kernel Method for the Two-Sample-Problem , 2006, NIPS.
[68] M. Yuan,et al. Model selection and estimation in regression with grouped variables , 2006 .
[69] Klaus-Robert Müller,et al. Covariate Shift Adaptation by Importance Weighted Cross Validation , 2007, J. Mach. Learn. Res..
[70] Shimon Ullman,et al. Uncovering shared structures in multiclass classification , 2007, ICML '07.
[71] Kazuyuki Aihara,et al. Classifying matrices with a spectral regularization , 2007, ICML '07.
[72] Masashi Sugiyama,et al. Dimensionality Reduction of Multimodal Labeled Data by Local Fisher Discriminant Analysis , 2007, J. Mach. Learn. Res..
[73] Massimiliano Pontil,et al. Convex multi-task feature learning , 2008, Machine Learning.
[74] R. Tibshirani,et al. Sparse inverse covariance estimation with the graphical lasso. , 2008, Biostatistics.
[75] Yoshua Bengio,et al. Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.
[76] Bernhard Schölkopf,et al. Characteristic Kernels on Groups and Semigroups , 2008, NIPS.
[77] M. Kawanabe,et al. Direct importance estimation for covariate shift adaptation , 2008 .
[78] Yoram Singer,et al. Efficient projections onto the l1-ball for learning in high dimensions , 2008, ICML '08.
[79] John Langford,et al. Sparse Online Learning via Truncated Gradient , 2008, NIPS.
[80] Shinichi Nakajima,et al. Semi-supervised local Fisher discriminant analysis for dimensionality reduction , 2009, Machine Learning.
[81] Karl Pearson F.R.S.. X. On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling , 2009 .
[82] Sumio Watanabe. Algebraic Geometry and Statistical Learning Theory , 2009 .
[83] Pierre Hansen,et al. NP-hardness of Euclidean sum-of-squares clustering , 2008, Machine Learning.
[84] Lior Rokach,et al. Recommender Systems Handbook , 2010 .
[85] Martin J. Wainwright,et al. Estimating Divergence Functionals and the Likelihood Ratio by Convex Risk Minimization , 2008, IEEE Transactions on Information Theory.
[86] Chih-Jen Lin,et al. LIBSVM: A library for support vector machines , 2011, TIST.
[87] Stephen P. Boyd,et al. Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..
[88] Kevin P. Murphy,et al. Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.
[89] Sivaraman Balakrishnan,et al. Optimal kernel choice for large-scale two-sample tests , 2012, NIPS.
[90] Masashi Sugiyama,et al. Semi-Supervised Learning of Class Balance under Class-Prior Change by Distribution Matching , 2012, ICML.
[91] Masashi Sugiyama,et al. Sequential change‐point detection based on direct density‐ratio estimation , 2012, Stat. Anal. Data Min..
[92] Kenji Fukumizu,et al. Equivalence of distance-based and RKHS-based statistics in hypothesis testing , 2012, ArXiv.
[93] Motoaki Kawanabe,et al. Machine Learning in Non-Stationary Environments - Introduction to Covariate Shift Adaptation , 2012, Adaptive computation and machine learning.
[94] Maria L. Rizzo,et al. Energy statistics: A class of statistics based on distances , 2013 .
[95] Takafumi Kanamori,et al. Relative Density-Ratio Estimation for Robust Distribution Comparison , 2011, Neural Computation.
[96] Koby Crammer,et al. Adaptive regularization of weight vectors , 2009, Machine Learning.
[97] Masashi Sugiyama,et al. Direct Learning of Sparse Changes in Markov Networks by Density Ratio Estimation , 2013, Neural Computation.
[98] Sugiyama Masashi,et al. Coping with Class Balance Change in Classification : Class-Prior Estimation with Energy Distance , 2014 .
[99] Masashi Sugiyama,et al. Statistical Reinforcement Learning - Modern Machine Learning Approaches , 2015, Chapman and Hall / CRC machine learning and pattern recognition series.
[100] Masashi Sugiyama,et al. Direct Estimation of the Derivative of Quadratic Mutual Information with Application in Supervised Dimension Reduction , 2017, Neural Computation.