Occam's razor, the principle of parsimony, is a tool that finds application in many areas of science. Its validity has never been proven in full generality from first principles. Convergent guessing is the property th at as more examples of an input-output mapp ing are provided , one would expect the modelling of that mapping to become mor e and more accurate. It too is widely used and has not been proven from first principles. In thi s paper it is shown that Occam 's razor and convergent guessin g are not independ ent if convergent guessing holds , so does Occam 's raz or. (The converse of thi s statement is also true , providing some extra condi tions are met.) Therefore, if you have reason to believe th at your guesses are getting mor e accurate as you are fed more data, you also have reason to believe that application of Occam's razor will likely result in better guesses. Rather than attri butes concerning how an ar chitecture works (e .g. , it s coding length , or its number of free par ameters) , thi s pap er is concern ed exclusively with how the ar chitecture g uesses (which is, aft er all , what we're really interest ed in). In this contex t Occam's razor mean s that one should guess according to the "simplicity" of an archite cture's guessing behavior (as opposed to according to the sim plicity of how the ar chi tecture works). This pap er deduces an optimal measure of the "simplicity" of an architecture's guessing beh avior. Given this op timal simplicity measure, this paper th en establishes th e aforementioned relationship between Occam's razor and converg ent guessing. T his paper goes on to elucidate the many oth er advantages, both practical an d theoreti cal, of using the optimal simplicity measure. Finally, this paper end s by exploring the ramifications of th is analysis for the question of how best to measure t he "complexity" of a syste m. ' Elect ronic m ail ad dress: dhw@coot.lanl.gov. © 1990 Complex Systems Pu blications, Inc .
[1]
G. Chaitin.
Randomness and Mathematical Proof
,
1975
.
[2]
B. Efron.
Computers and the Theory of Statistics: Thinking the Unthinkable
,
1979
.
[3]
Tom M. Mitchell,et al.
Generalization as Search
,
2002
.
[4]
Michael Brady.
MIT Progress in Understanding Images
,
1982
.
[5]
David L. Waltz,et al.
Toward memory-based reasoning
,
1986,
CACM.
[6]
J. Rissanen.
Stochastic Complexity and Modeling
,
1986
.
[7]
M. Feder.
Maximum entropy as a special case of the minimum description length criterion
,
1986,
IEEE Trans. Inf. Theory.
[8]
Lawrence D. Jackel,et al.
Large Automatic Learning, Rule Extraction, and Generalization
,
1987,
Complex Syst..
[9]
Satosi Watanabe.
Inductive ambiguity and the limits of artificial intelligence
,
1987,
Comput. Intell..
[10]
Stephen M. Omohundro,et al.
Efficient Algorithms with Neural Network Behavior
,
1987,
Complex Syst..
[11]
Terrence J. Sejnowski,et al.
NETtalk: a parallel network that learns to read aloud
,
1988
.
[12]
S. Lloyd,et al.
Complexity as thermodynamic depth
,
1988
.
[13]
David H. Wolpert,et al.
A Mathematical Theory of Generalization: Part II
,
1990,
Complex Syst..
[14]
John H. Holland,et al.
Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence
,
1992
.