Analysis of Two Gradient-Based Algorithms for On-Line Regression
暂无分享,去创建一个
[1] W. Hoeffding. Probability Inequalities for sums of Bounded Random Variables , 1963 .
[2] D. Pollard. Convergence of stochastic processes , 1984 .
[3] N. Littlestone. Learning Quickly When Irrelevant Attributes Abound: A New Linear-Threshold Algorithm , 1987, 28th Annual Symposium on Foundations of Computer Science (sfcs 1987).
[4] Nick Littlestone,et al. From on-line to batch learning , 1989, COLT '89.
[5] Thomas M. Cover,et al. Elements of Information Theory , 2005 .
[6] Neri Merhav,et al. Universal prediction of individual sequences , 1992, IEEE Trans. Inf. Theory.
[7] David Haussler,et al. Decision Theoretic Generalizations of the PAC Model for Neural Net and Other Learning Applications , 1992, Inf. Comput..
[8] Manfred K. Warmuth,et al. The Weighted Majority Algorithm , 1994, Inf. Comput..
[9] Vladimir Vovk,et al. A game of prediction with expert advice , 1995, COLT '95.
[10] László Györfi,et al. A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.
[11] Philip M. Long,et al. Worst-case quadratic loss bounds for prediction using linear functions and gradient descent , 1996, IEEE Trans. Neural Networks.
[12] Yoav Freund,et al. Predicting a binary sequence almost as well as the optimal biased coin , 2003, COLT '96.
[13] Manfred K. Warmuth,et al. How to use expert advice , 1997, JACM.
[14] Vladimir Vovk,et al. Competitive On-line Linear Regression , 1997, NIPS.
[15] Dale Schuurmans,et al. General Convergence Results for Linear Discriminant Updates , 1997, COLT.
[16] Manfred K. Warmuth,et al. Exponentiated Gradient Versus Gradient Descent for Linear Predictors , 1997, Inf. Comput..
[17] Tom Bylander,et al. Worst-Case Absolute Loss Bounds for Linear Learning Algorithms , 1997, AAAI/IAAI.
[18] Kenji Yamanishi,et al. A Decision-Theoretic Extension of Stochastic Complexity and Its Applications to Learning , 1998, IEEE Trans. Inf. Theory.