Guest Editors' foreword

This special issue of Theoretical Computer Science is dedicated to the 21st International Conference on Algorithmic Learning Theory (ALT 2010) held in Canberra, Australia, October 6–8, 2010. It contains nine articles thatwere among the best in the conference.1 The authors of these papers have been invited by the Special Issue Editors to submit completed versions of their work for this special issue. Once received, these papers underwent the usual refereeing process of Theoretical Computer Science. In the following, we briefly introduce each of the papers. Algorithmic learning theory focuses on theoretical aspects of machine learning. It is dedicated to studies of learning from a mathematical and algorithmic perspective. Depending on the learning task considered, considerable interaction between various mathematical theories including statistics, probability theory, combinatorics, linguistics, and the theory of computation is required. These studies comprise the investigation of various formal models of machine learning and statistical learning, and the design and analysis of learning algorithms. This also leads to a fruitful interaction with the practical fields of machine learning, linguistics, psychology, and philosophy of science. The first paper in this special issue belongs to the area of PAC learning. In this model the learner observes the data concerning a target concept generated according to an unknown probability distribution, but it does not have to figure out aspects of the concept to be learned which are unlikely to be observed. That is, when learning a concept L, the learner observes randomly drawn data according to some unknown probability distribution D and the learner has to find with high probability a hypothesis H such that H is similar to L with respect to the distribution D, i.e., D({x | H(x) ≠ L(x)}) is below a bound given to the algorithm as a parameter. In their paper Tighter PAC–Bayes bounds through distribution-dependent priors, Lever, Laviolette, and Shawe-Taylor prove sharp risk bounds for stochastic exponential weight algorithms. The idea is here to base the analysis on a prior defined in terms of the data generating distribution. The authors derive a number of PAC–Bayes bounds for Gibbs classifiers using prior and posterior distributions which are defined, respectively, in terms of regularized empirical and true risks for a problem. The results rely on a key bound on the Kullback–Leibler divergence between distributions of this form. Furthermore, this bound introduces a new complexity measure. The topic of Pestov’s paper is already explained by its title, PAC learnability under non-atomic measures: A problem by Vidyasagar. In the PAC learning model the standard lower bounds for the sample complexity in terms of the Vapnik– Chervonenkis dimension and similar quantities are based on very adversarial probability distributions. These distributions are defined in such a way that all the probability is allocated to a small number of ‘‘difficult’’ points. So, it is only natural to consider the problem of how the lower bounds extend to cases where the distributions are a bit more reasonable. In 1997 Vidyasagar posed the problem of characterizing learnability under non-atomic distributions, where a distribution D is said to be non-atomic if every set A with D(A) > 0 has a subset B with 0 < D(B) < D(A). Pestov resolves this problem by introducing the notion of Vapnik–Chervonenkis dimension modulo countable sets. This allows him to obtain a complete characterization. Note that instead of the usual measure-theoretic assumptions about the concept class, the author assumes Martin’s Axiom, a set-theoretic axiom weaker than the continuum hypothesis. The results are also extended to the case of learning real-valued functions. The next paper belongs to the area of learning probabilistic automata and of query learning. In the query learning model the learner aims to identify a concept which a teacher is teaching. So the learner is allowed to ask queries which the teacher has to answer truthfully, but not more helpfully than required. In most settings of query learning, the queries are of a fixed form. One type of such queries are statistical queries where an underlying distribution is assumed and the teacher returns a polynomial-time program which has — with respect to the underlying distribution — an error probability below a parameter given in the query. The paper Learning probabilistic automata: A study in state distinguishability by Balle, Castro, and