Upper and Lower Error Bounds for Active Learning

This paper analyzes the potential advantages and theoretical challenges of ”active learning” algorithms. Active learning involves sequential, adaptive sampling procedures that use information gleaned from previous samples in order to focus the sampling and accelerate the learning process relative to “passive learning” algorithms, which are based on non-adaptive (usually random) samples. There are a number of empirical and theoretical results suggesting that in certain situations active learning can be significantly more effective than passive learning. However, the fact that active learning algorithms are feedback systems makes their theoretical analysis very challenging. It is known that active learning can provably improve on passive learning if the error or noise rate of the sampling process is bounded. However, if the noise rate is unbounded, perhaps the situation most common in practice, then no previously existing theory demonstrates whether or not active learning offers an advantage. To study this issue, we investigate the basic problem of learning a threshold function from noisy observations. We present an algorithm that provably improves on passive learning, even when the noise is unbounded. Moreover, we derive a minimax lower bound for this learning problem, demonstrating that our proposed active learning algorithm converges at the near-optimal rate.