On approximating weighted sums with exponentially many terms

Multiplicative weight-update algorithms such as Winnow and Weighted Majority have been studied extensively due to their on-line mistake bounds' logarithmic dependence on N, the total number of inputs, which allows them to be applied to problems where N is exponential. However, a large N requires techniques to efficiently compute the weighted sums of inputs to these algorithms. In special cases, the weighted sum can be exactly computed efficiently, but for numerous problems such an approach seems infeasible. Thus we explore applications of Markov chain Monte Carlo (MCMC) methods to estimate the total weight. Our methods are very general and applicable to any representation of a learning problem for which the inputs to a linear learning algorithm can be represented as states in a completely connected, untruncated Markov chain. We give theoretical worst-case guarantees on our technique and then apply it to two problems: learning DNF formulas using Winnow, and pruning classifier ensembles using Weighted Majority. We then present empirical results on simulated data indicating that in practice, the time complexity is much better than what is implied by our worst-case theoretical analysis.

[1]  Thomas G. Dietterich,et al.  Solving the Multiple Instance Problem with Axis-Parallel Rectangles , 1997, Artif. Intell..

[2]  F ROSENBLATT,et al.  The perceptron: a probabilistic model for information storage and organization in the brain. , 1958, Psychological review.

[3]  Stephen D. Scott,et al.  A Faster Algorithm for Generalized Multiple-Instance Learning , 2004, FLAIRS Conference.

[4]  Claudio Gentile,et al.  Linear Hinge Loss and Average Margin , 1998, NIPS.

[5]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[6]  Thomas G. Dietterich,et al.  Pruning Adaptive Boosting , 1997, ICML.

[7]  Yoram Singer,et al.  An efficient extension to mixture techniques for prediction and decision trees , 1997, COLT '97.

[8]  Yishay Mansour,et al.  Weakly learning DNF and characterizing statistical query learning using Fourier analysis , 1994, STOC '94.

[9]  Mark Jerrum,et al.  The Markov chain Monte Carlo method: an approach to approximate counting and integration , 1996 .

[10]  Manfred K. Warmuth,et al.  The weighted majority algorithm , 1989, 30th Annual Symposium on Foundations of Computer Science.

[11]  J. Ross Quinlan,et al.  Bagging, Boosting, and C4.5 , 1996, AAAI/IAAI, Vol. 1.

[12]  Avrim Blum,et al.  On learning embedded symmetric concepts , 1993, COLT '93.

[13]  Stephen Kwek,et al.  Agnostic Learning of Geometric Patterns , 2001, J. Comput. Syst. Sci..

[14]  Alistair Sinclair,et al.  Random Walks on Truncated Cubes and Sampling 0-1 Knapsack Solutions , 2004, SIAM J. Comput..

[15]  Jun Zhang,et al.  On Generalized Multiple-instance Learning , 2005, Int. J. Comput. Intell. Appl..

[16]  Rocco A. Servedio,et al.  Efficiency versus Convergence of Boolean Kernels for On-Line Learning Algorithms , 2001, NIPS.

[17]  Leslie G. Valiant,et al.  Random Generation of Combinatorial Structures from a Uniform Distribution , 1986, Theor. Comput. Sci..

[18]  LittlestoneNick Learning Quickly When Irrelevant Attributes Abound , 1988 .

[19]  Manfred K. Warmuth,et al.  Predicting nearly as well as the best pruning of a planar decision graph , 2002, Theor. Comput. Sci..

[20]  Yoram Singer,et al.  Improved Boosting Algorithms Using Confidence-rated Predictions , 1998, COLT' 98.

[21]  Alistair Sinclair,et al.  Random walks on truncated cubes and sampling 0-1 knapsack solutions , 1999, 40th Annual Symposium on Foundations of Computer Science (Cat. No.99CB37039).

[22]  Sally A. Goldman,et al.  Multiple-Instance Learning of Real-Valued Geometric Patterns , 2003, Annals of Mathematics and Artificial Intelligence.

[23]  Robert E. Schapire,et al.  Predicting Nearly as Well as the Best Pruning of a Decision Tree , 1995, COLT.

[24]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[25]  A. Atiya,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2005, IEEE Transactions on Neural Networks.

[26]  Yoav Freund,et al.  Boosting the margin: A new explanation for the effectiveness of voting methods , 1997, ICML.

[27]  A. Sinclair Improved Bounds for Mixing Rates of Markov Chains and Multicommodity Flow , 1992, Combinatorics, Probability and Computing.

[28]  Deepak Chawla,et al.  Efficiently Approximating Weighted Sums with Exponentially Many Terms , 2001, COLT/EuroCOLT.

[29]  Dorit S. Hochbaum,et al.  Approximation Algorithms for NP-Hard Problems , 1996 .

[30]  David Haussler,et al.  How to use expert advice , 1993, STOC.

[31]  Manfred K. Warmuth,et al.  Efficient Learning With Virtual Threshold Gates , 1995, Inf. Comput..

[32]  Gunnar Rätsch,et al.  An introduction to kernel-based learning algorithms , 2001, IEEE Trans. Neural Networks.

[33]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[34]  Nader H. Bshouty,et al.  Simple learning algorithms using divide and conquer , 1995, COLT '95.

[35]  Christino Tamon,et al.  On the Boosting Pruning Problem , 2000, ECML.

[36]  N. Metropolis,et al.  Equation of State Calculations by Fast Computing Machines , 1953, Resonance.

[37]  N. Littlestone Learning Quickly When Irrelevant Attributes Abound: A New Linear-Threshold Algorithm , 1987, 28th Annual Symposium on Foundations of Computer Science (sfcs 1987).

[38]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[39]  Nick Littlestone,et al.  Redundant noisy attributes, attribute errors, and linear-threshold learning using winnow , 1991, COLT '91.

[40]  Nick Littlestone,et al.  From on-line to batch learning , 1989, COLT '89.

[41]  Martin E. Dyer,et al.  A Mildly Exponential Time Algorithm for Approximating the Number of Solutions to a Multidimensional Knapsack Problem , 1993, Combinatorics, Probability and Computing.

[42]  S. Scott,et al.  AN ANALYSIS OF MCMC SAMPLING METHODS FOR ESTIMATING WEIGHTED SUMS IN WINNOW , 2004 .

[43]  Manfred K. Warmuth,et al.  The Perceptron Algorithm Versus Winnow: Linear Versus Logarithmic Mistake Bounds when Few Input Variables are Relevant (Technical Note) , 1997, Artif. Intell..

[44]  Manfred K. Warmuth,et al.  The perceptron algorithm vs. Winnow: linear vs. logarithmic mistake bounds when few input variables are relevant , 1995, COLT '95.

[45]  Nader H. Bshouty,et al.  More efficient PAC-learning of DNF with membership queries under the uniform distribution , 2004, J. Comput. Syst. Sci..