Mistake bounds and logarithmic linear-threshold learning algorithms

We consider a problem of learning from examples. In each of a sequence of trials the learner observes an instance to be classified and must respond with a prediction of its correct classification. Following the prediction the learner is told the correct response. We consider the two category case, and work for the most part with instances composed of a number of Boolean attributes or, more generally, chosen from (0, 1) $\sp{n}$. Initially, we assume that the correct response in each trial is a function of the corresponding instance, this target function being chosen from some target class of $\{0,1\}$-valued functions. We relax this assumption somewhat later. We focus on evaluation of on-line predictive performance, counting the number of mistakes made by the learner during the learning process. For certain target classes we have found algorithms for which we can prove excellent mistake bounds, using no probabilistic assumptions. In the first part of the dissertation we study the properties of such worst-case mistake bounds. In the central part of this dissertation, we present a group of linear-threshold algorithms that are particularly well suited to circumstances in which most of the attributes of the instances are irrelevant to determining the correct predictions. For target classes that can be handled by these algorithms, we show that the worst-case mistake bounds grow only logarithmically with the number of irrelevant attributes. We demonstrate that these algorithms have some measure of robustness in the face of anomalies in the training data caused, for example, by noise. We also consider the implications of on-line, worst-case mistake bounds for learning in a batch setting with probabilistic assumptions, making use of the probabilistic PAC-learning model introduced by Valiant (Va184). We present an analysis that shows that a straightforward transformation applied to mistake bounded algorithms, consisting of adding a hypothesis testing phase, produces algorithms that have asymptotically optimal PAC-learning bounds for certain target classes.