Logistic Regression, AdaBoost and Bregman Distances
暂无分享,去创建一个
[1] L. Bregman. The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming , 1967 .
[2] J. Darroch,et al. Generalized Iterative Scaling for Log-Linear Models , 1972 .
[3] I. Csiszár. $I$-Divergence Geometry of Probability Distributions and Minimization Problems , 1975 .
[4] Flemming Topsøe,et al. Information-theoretical optimization techniques , 1979, Kybernetika.
[5] Y. Censor,et al. An iterative row-action method for interval convex programming , 1981 .
[6] I. Csiszár. Sanov Property, Generalized $I$-Projection and a Conditional Limit Theorem , 1984 .
[7] I. Csiszár. Why least squares and maximum entropy? An axiomatic approach to inference for linear inverse problems , 1991 .
[8] Philip M. Long,et al. On-line learning of linear functions , 1991, STOC '91.
[9] Hans Ulrich Simon,et al. Robust Trainability of Single Neurons , 1995, J. Comput. Syst. Sci..
[10] Manfred K. Warmuth,et al. Bounds on approximate steepest descent for likelihood maximization in exponential families , 1994, IEEE Trans. Inf. Theory.
[11] Manfred K. Warmuth,et al. Additive versus exponentiated gradient updates for linear prediction , 1995, STOC '95.
[12] I. Csiszár. Generalized projections for non-negative functions , 1995, Proceedings of 1995 IEEE International Symposium on Information Theory.
[13] I. Csiszár. Generalized projections for non-negative functions , 1995 .
[14] Yoav Freund,et al. A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.
[15] I. Csiszár. Maxent, Mathematics, and Information Theory , 1996 .
[16] Adam L. Berger,et al. A Maximum Entropy Approach to Natural Language Processing , 1996, CL.
[17] Joachim M. Buhmann,et al. Pairwise Data Clustering by Deterministic Annealing , 1997, IEEE Trans. Pattern Anal. Mach. Intell..
[18] Y. Censor,et al. Parallel Optimization: Theory, Algorithms, and Applications , 1997 .
[19] L. Breiman. Arcing the edge , 1997 .
[20] Manfred K. Warmuth,et al. Exponentiated Gradient Versus Gradient Descent for Linear Predictors , 1997, Inf. Comput..
[21] S. D. Pietra,et al. Statistical Learning Algorithms Based on Bregman Distances , 1997 .
[22] John D. Lafferty,et al. Inducing Features of Random Fields , 1995, IEEE Trans. Pattern Anal. Mach. Intell..
[23] Yoram Singer,et al. Improved Boosting Algorithms Using Confidence-rated Predictions , 1998, COLT' 98.
[24] Rajeev Sharma,et al. Advances in Neural Information Processing Systems 11 , 1999 .
[25] Yoav Freund,et al. The Alternating Decision Tree Learning Algorithm , 1999, ICML.
[26] Osamu Watanabe. From Computational Learning Theory to Discovery Science , 1999, ICALP.
[27] Yoram Singer,et al. A simple, fast, and effective rule learner , 1999, AAAI 1999.
[28] Leo Breiman,et al. Prediction Games and Arcing Algorithms , 1999, Neural Computation.
[29] J. Lafferty. Additive models, boosting, and inference for generalized divergences , 1999, COLT '99.
[30] Manfred K. Warmuth,et al. Boosting as entropy projection , 1999, COLT '99.
[31] David P. Helmbold,et al. Potential Boosters? , 1999, NIPS.
[32] Osamu Watanabe,et al. Scaling Up a Boosting-Based Learner via Adaptive Sampling , 2000, PAKDD.
[33] J. Friedman. Special Invited Paper-Additive logistic regression: A statistical view of boosting , 2000 .
[34] Peter L. Bartlett,et al. Functional Gradient Techniques for Combining Hypotheses , 2000 .
[35] Sally A. Goldman,et al. Proceedings of the Thirteenth Annual Conference on Computational Learning Theory (COLT 2000), June 28 - July 1, 2000, Palo Alto, California, USA , 2000 .
[36] S. D. Pietra,et al. Duality and Auxiliary Functions for Bregman Distances , 2001 .
[37] Gunnar Rätsch,et al. Soft Margins for AdaBoost , 2001, Machine Learning.
[38] Manfred K. Warmuth,et al. Relative Loss Bounds for Multidimensional Regression Problems , 1997, Machine Learning.
[39] Yoram Singer,et al. BoosTexter: A Boosting-based System for Text Categorization , 2000, Machine Learning.