Costs of General Purpose Learning

Leo Harrington surprisingly constructed a machine which can learn any computable function f according to the following criterion (called Bc*-identification). His machine, on the successive graph points of f, outputs a corresponding infinite sequence of programs p0, p1, p2, ..., and, for some i, the programs pi, pi + 1, pi+2, ... each compute a variant of f which differs from f at only finitely many argument places. A machine with this property is called general purpose. The sequence pi, pi+1, pi+2, ... is called a final sequence. For Harrington's general purpose machine, for distinct m and n, the finitely many argument places where pi+m fails to compute f can be very different from the finitely many argument places where pi+n fails to compute f. One would hope though, that if Harrington's machine, or an improvement thereof, inferred the program pi+m based on the data points f(0), f(1), ..., f(k), then pi+m would make very few mistakes computing f at the "near future" arguments k + 1; k + 2, ..., k + l, where l is reasonably large. Ideally, pi+m's finitely many mistakes or anomalies would (mostly) occur at arguments x ≫ k, i.e., ideally, its anomalies would be well placed beyond near future arguments. In the present paper, for general purpose learning machines, it is analyzed just how well or badly placed these anomalies may be with respect to near future arguments and what are the various tradeoffs. In particular, there is good news and bad. Bad news is that, for any learning machine M (including general purpose M), for all m, there exist infinitely many computable functions f such that, infinitely often M incorrectly predicts f's next m near future values. Good news is that, for a suitably clever general purpose learning machine M, for each computable f, for M on f, the density of any such associated bad prediction intervals of size m is vanishingly small. Considered too is the possibility of providing a general purpose learner which additionally learns some interesting classes with respect to much stricter criteria than Bc*-identification. Again there is good news and bad. The criterion of finite identification requires for success that a learner M on a function f output exactly one program which correctly computes f. Bcn-identification is just like Bc*-identification above except that the number of anomalies in each program of a final sequence is ≤ n. Bad news is that there is a finitely identifiable class of computable functions C such that for no general purpose learner M and for no n, does M additionally Bcn-identify C. Ex-dentification by M on f requires that M on f converges, after a few output programs, to a single final program which computes f. A reliable learner (by definition) never deceives by false convergence; more precisely: whenever it converges to a final program on a function f, it must Ex-identify f. Good news is that, for any class C that can be reliably Ex-identified, there is a general purpose machine which additionally Ex-identifies C!

[1]  Karlis Podnieks Comparing various concepts of function prediction. Part 1. , 1974 .

[2]  Eliana Minicozzi,et al.  Some Natural Properties of Strong-Identification in Inductive Inference , 1976, Theor. Comput. Sci..

[3]  John Case,et al.  Comparison of Identification Criteria for Machine Inductive Inference , 1983, Theor. Comput. Sci..

[4]  John Case,et al.  Refinements of inductive inference by Popperian and reliable machines , 1994, Kybernetika.

[5]  E. Mark Gold,et al.  Language Identification in the Limit , 1967, Inf. Control..

[6]  John Case,et al.  Anomaly hierarchies of mechanized inductive inference , 1978, STOC.

[7]  Daniel N. Osherson,et al.  Systems That Learn: An Introduction to Learning Theory for Cognitive and Computer Scientists , 1990 .

[8]  K. Popper,et al.  The Logic of Scientific Discovery , 1960 .

[9]  Gary James Jason,et al.  The Logic of Scientific Discovery , 1988 .

[10]  Manuel Blum,et al.  A Machine-Independent Theory of the Complexity of Recursive Functions , 1967, JACM.

[11]  Hartley Rogers,et al.  Gödel numberings of partial recursive functions , 1958, Journal of Symbolic Logic.

[12]  Jr. Hartley Rogers Theory of Recursive Functions and Effective Computability , 1969 .

[13]  Keh-Jiann Chen,et al.  Tradeoffs in machine inductive inference , 1981 .

[14]  Rolf Wiehagen,et al.  Inductive Inference with Additional Information , 1979, J. Inf. Process. Cybern..

[15]  Manuel Blum,et al.  Toward a Mathematical Theory of Inductive Inference , 1975, Inf. Control..

[16]  Gregory A. Riccardi The Independence of Control Structures in Abstract Programming Systems , 1981, J. Comput. Syst. Sci..

[17]  James S. Royer A Connotational Theory of Program Structure , 1987, Lecture Notes in Computer Science.