Feature selection, L1 vs. L2 regularization, and rotational invariance
暂无分享,去创建一个
[1] 丸山 徹. Convex Analysisの二,三の進展について , 1977 .
[2] R. Bordley. A Multiplicative Formula for Aggregating Probability Assessments , 1982 .
[3] Dimitri P. Bertsekas,et al. Constrained Optimization and Lagrange Multiplier Methods , 1982 .
[4] Anne Lohrli. Chapman and Hall , 1985 .
[5] N. Littlestone. Learning Quickly When Irrelevant Attributes Abound: A New Linear-Threshold Algorithm , 1987, 28th Annual Symposium on Foundations of Computer Science (sfcs 1987).
[6] Leslie G. Valiant,et al. A general lower bound on the number of examples needed for learning , 1988, COLT '88.
[7] D. Pollard. Empirical Processes: Theory and Applications , 1990 .
[8] Audra E. Kosh,et al. Linear Algebra and its Applications , 1992 .
[9] John Riedl,et al. GroupLens: an open architecture for collaborative filtering of netnews , 1994, CSCW '94.
[10] R. Tibshirani. Regression Shrinkage and Selection via the Lasso , 1996 .
[11] Ron Kohavi,et al. Wrappers for Feature Subset Selection , 1997, Artif. Intell..
[12] Pat Langley,et al. Selection of Relevant Features and Examples in Machine Learning , 1997, Artif. Intell..
[13] Manfred K. Warmuth,et al. Exponentiated Gradient Versus Gradient Descent for Linear Predictors , 1997, Inf. Comput..
[14] Tom Heskes,et al. Selecting Weighting Factors in Logarithmic Opinion Pools , 1997, NIPS.
[15] Peter L. Bartlett,et al. The Sample Complexity of Pattern Classification with Neural Networks: The Size of the Weights is More Important than the Size of the Network , 1998, IEEE Trans. Inf. Theory.
[16] David Heckerman,et al. Empirical Analysis of Predictive Algorithms for Collaborative Filtering , 1998, UAI.
[17] Vladimir Vapnik,et al. Statistical learning theory , 1998 .
[18] Andrew Y. Ng,et al. On Feature Selection: Learning with Exponentially Many Irrelevant Features as Training Examples , 1998, ICML.
[19] Andrew McCallum,et al. Using Maximum Entropy for Text Classification , 1999 .
[20] Peter L. Bartlett,et al. Neural Network Learning - Theoretical Foundations , 1999 .
[21] David G. Stork,et al. Pattern Classification (2nd ed.) , 1999 .
[22] Trevor Hastie,et al. The Elements of Statistical Learning , 2001 .
[23] Thomas Hofmann,et al. Learning What People (Don't) Want , 2001, ECML.
[24] Michael I. Jordan,et al. Convergence rates of the Voting Gibbs classifier, with application to Bayesian feature selection , 2001, ICML.
[25] Shigeo Abe DrEng. Pattern Classification , 2001, Springer London.
[26] Tong Zhang,et al. Covering Number Bounds of Certain Regularized Linear Function Classes , 2002, J. Mach. Learn. Res..
[27] Eric R. Ziegel,et al. Generalized Linear Models , 2002, Technometrics.
[28] Geoffrey E. Hinton. Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.
[29] John F. Canny,et al. Collaborative filtering with privacy via factor analysis , 2002, SIGIR '02.
[30] Bernhard Schölkopf,et al. Use of the Zero-Norm with Linear Models and Kernel Methods , 2003, J. Mach. Learn. Res..
[31] Michael I. Jordan,et al. Statistical Debugging of Sampled Programs , 2003, NIPS.
[32] Benjamin M. Marlin,et al. Modeling User Rating Profiles For Collaborative Filtering , 2003, NIPS.
[33] Benjamin M. Marlin,et al. Collaborative Filtering: A Machine Learning Perspective , 2004 .
[34] V. Vapnik. Estimation of Dependences Based on Empirical Data , 2006 .