The teaching size: computable teachers and learners for universal languages

The theoretical hardness of machine teaching has usually been analyzed for a range of concept languages under several variants of the teaching dimension: the minimum number of examples that a teacher needs to figure out so that the learner identifies the concept. However, for languages where concepts have structure (and hence size), such as Turing-complete languages, a low teaching dimension can be achieved at the cost of using very large examples, which are hard to process by the learner. In this paper we introduce the teaching size, a more intuitive way of assessing the theoretical feasibility of teaching concepts for structured languages. In the most general case of universal languages, we show that focusing on the total size of a witness set rather than its cardinality, we can teach all total functions that are computable within some fixed time bound. We complement the theoretical results with a range of experimental results on a simple Turing-complete language, showing how teaching dimension and teaching size differ in practice. Quite remarkably, we found that witness sets are usually smaller than the programs they identify, which is an illuminating justification of why machine teaching from examples makes sense at all.

[1]  Ray J. Solomonoff,et al.  A Formal Theory of Inductive Inference. Part I , 1964, Inf. Control..

[2]  Ray J. Solomonoff,et al.  A Formal Theory of Inductive Inference. Part II , 1964, Inf. Control..

[3]  Martin Davis,et al.  Review: Corrado Bohm, On a Family of Turing Machines and the Related Programming Language , 1966 .

[4]  E. Mark Gold,et al.  Language Identification in the Limit , 1967, Inf. Control..

[5]  Vladimir Vapnik,et al.  Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .

[6]  Peter Elias,et al.  Universal codeword sets and representations of the integers , 1975, IEEE Trans. Inf. Theory.

[7]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, STOC '84.

[8]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, CACM.

[9]  Rusins Freivalds,et al.  Inductive Inference from Good Examples , 1989, AII.

[10]  Michael Kearns,et al.  On the complexity of teaching , 1991, COLT '91.

[11]  Rusins Freivalds,et al.  On the Power of Inductive Inference from Good Examples , 1993, Theor. Comput. Sci..

[12]  Ming Li,et al.  An Introduction to Kolmogorov Complexity and Its Applications , 2019, Texts in Computer Science.

[13]  Sally A. Goldman,et al.  Teaching a smart learner , 1993, COLT '93.

[14]  H. Lieberman Your Wish is My Command: Programming By Example , 2001 .

[15]  Dana Angluin,et al.  Learning from Different Teachers , 2004, Machine Learning.

[16]  Frank J. Balbach Models for algorithmic teaching , 2007 .

[17]  P. Vitányi,et al.  An Introduction to Kolmogorov Complexity and Its Applications, Third Edition , 1997, Texts in Computer Science.

[18]  Frank J. Balbach,et al.  Measuring teachability using variants of the teaching dimension , 2008, Theor. Comput. Sci..

[19]  Ayumi Shinohara,et al.  Teachability in computational learning , 1990, New Generation Computing.

[20]  Jason Weston,et al.  Curriculum learning , 2009, ICML '09.

[21]  Thomas Zeugmann,et al.  Recent Developments in Algorithmic Teaching , 2009, LATA.

[22]  Bilge Mutlu,et al.  How Do Humans Teach: On Curriculum Learning and Teaching Dimension , 2011, NIPS.

[23]  Xiaojin Zhu,et al.  Machine Teaching for Bayesian Learners in the Exponential Family , 2013, NIPS.

[24]  Noah D. Goodman,et al.  A rational account of pedagogical reasoning: Teaching by, and learning from, examples , 2014, Cognitive Psychology.

[25]  Sumit Gulwani,et al.  Inductive programming meets the real world , 2015, Commun. ACM.

[26]  Joshua B. Tenenbaum,et al.  Human-level concept learning through probabilistic program induction , 2015, Science.

[27]  Xiaojin Zhu,et al.  Machine Teaching: An Inverse Problem to Machine Learning and an Approach Toward Optimal Education , 2015, AAAI.

[28]  Hans Ulrich Simon,et al.  Preference-based Teaching , 2016, COLT.

[29]  S. Jun 50 , 000 , 000 , 000 Instructions Per Second : Design and Implementation of a 256-Core BrainFuck Computer , 2016 .

[30]  Or Biran,et al.  Explanation and Justification in Machine Learning : A Survey Or , 2017 .

[31]  David Maxwell Chickering,et al.  Machine Teaching: A New Paradigm for Building Machine Learning Systems , 2017, ArXiv.

[32]  Takeo Kanade,et al.  Language and Automata Theory and Applications , 2018, Lecture Notes in Computer Science.

[33]  Marco Baroni,et al.  Generalization without Systematicity: On the Compositional Skills of Sequence-to-Sequence Recurrent Networks , 2017, ICML.

[34]  Dileep George,et al.  Beyond imitation: Zero-shot task transfer on robots by learning concepts as cognitive programs , 2018, Science Robotics.