Analysis of two gradient-based algorithms for on-line regression
暂无分享,去创建一个
[1] W. Hoeffding. Probability Inequalities for sums of Bounded Random Variables , 1963 .
[2] D. Pollard. Convergence of stochastic processes , 1984 .
[3] N. Littlestone. Learning Quickly When Irrelevant Attributes Abound: A New Linear-Threshold Algorithm , 1987, 28th Annual Symposium on Foundations of Computer Science (sfcs 1987).
[4] Nick Littlestone,et al. From on-line to batch learning , 1989, COLT '89.
[5] Manfred K. Warmuth,et al. The weighted majority algorithm , 1989, 30th Annual Symposium on Foundations of Computer Science.
[6] Vladimir Vovk,et al. Aggregating strategies , 1990, COLT '90.
[7] Thomas M. Cover,et al. Elements of Information Theory , 2005 .
[8] Philip M. Long,et al. On-line learning of linear functions , 1991, STOC '91.
[9] Neri Merhav,et al. Universal prediction of individual sequences , 1992, IEEE Trans. Inf. Theory.
[10] David Haussler,et al. Decision Theoretic Generalizations of the PAC Model for Neural Net and Other Learning Applications , 1992, Inf. Comput..
[11] David Haussler,et al. How to use expert advice , 1993, STOC.
[12] Manfred K. Warmuth,et al. Using experts for predicting continuous outcomes , 1994, European Conference on Computational Learning Theory.
[13] Vladimir Vovk,et al. A game of prediction with expert advice , 1995, COLT '95.
[14] László Györfi,et al. A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.
[15] Philip M. Long,et al. Worst-case quadratic loss bounds for prediction using linear functions and gradient descent , 1996, IEEE Trans. Neural Networks.
[16] Yoav Freund,et al. Predicting a binary sequence almost as well as the optimal biased coin , 2003, COLT '96.
[17] Manfred K. Warmuth,et al. How to use expert advice , 1997, JACM.
[18] Vladimir Vovk,et al. Competitive On-line Linear Regression , 1997, NIPS.
[19] Dale Schuurmans,et al. General Convergence Results for Linear Discriminant Updates , 1997, COLT.
[20] Dale Schuurmans,et al. General Convergence Results for Linear Discriminant Updates , 1997, COLT '97.
[21] Manfred K. Warmuth,et al. Exponentiated Gradient Versus Gradient Descent for Linear Predictors , 1997, Inf. Comput..
[22] Tom Bylander,et al. Worst-Case Absolute Loss Bounds for Linear Learning Algorithms , 1997, AAAI/IAAI.
[23] Kenji Yamanishi,et al. A Decision-Theoretic Extension of Stochastic Complexity and Its Applications to Learning , 1998, IEEE Trans. Inf. Theory.