On Boosting and the Exponential Loss

Assuming that the loss is bounded in [0, 1]. Prove that with probability at least 1 − δ, we have that if I is a compression set for T , then: |L(A(T)) − ˆ L −I (A(T))| ≤ (l + 1) log m + log 2 δ 2(m − l) where l is the size of the compression set and the probability is with respect to a random draw of T. Assume that we have a set of hypothesis H = {h 1 (·), h 2 (·),. .. h k (·)} where each h is a mapping from X to {−1, 1} (in class, we considered slightly more general hypothesis class which mapped to [−1, 1]). Our weak learner learner will only use hypothesis from H. The AdaBoost algorithm we presented in class is equivalent to the following algorithm: