Predicting the unpredictable

A major difficulty for currently existing theories of inductive inference involves the question of what to do when novel, unknown, or previously unsuspected phenomena occur. In this paper one particular instance of this difficulty is considered, the so-called sampling of species problem. The classical probabilistic theories of inductive inference due to Laplace, Johnson, de Finetti, and Carnap adopt a model of simple enumerative induction in which there are a prespecified number of types or species which may be observed. But, realistically, this is often not the case. In 1838 the English mathematician Augustus De Morgan proposed a modification of the Laplacian model to accommodate situations where the possible types or species to be observed are not assumed to be known in advance; but he did not advance a justification for his solution. In this paper a general philosophical approach to such problems is suggested, drawing on work of the English mathematician J. F. C. Kingman. It then emerges that the solution advanced by De Morgan has a very deep, if not totally unexpected, justification. The key idea is that although 'exchangeable' random sequences are the right objects to consider when all possible outcome-types are known in advance, exchangeable random partitions are the right objects to consider when they are not. The result turns out to be very satisfying. The classical theory has several basic elements: a representation theorem for the general exchangeable sequence (the de Finetti representation theorem), a distinguished class of sequences (those employing Dirichlet priors), and a corresponding rule of succession (the continuum of inductive methods). The new theory has parallel basic elements: a representation theorem for the general exchangeable random partition (the Kingman representation theorem), a distinguished class of random partitions (the Poiss-on-Dirichlet process), and a rule of succession which corresponds to De Morgan's rule.

[1]  I. Good,et al.  THE NUMBER OF NEW SPECIES, AND THE INCREASE IN POPULATION COVERAGE, WHEN A SAMPLE IS INCREASED , 1956 .

[2]  David A. Freedman,et al.  De Finetti's generalizations of exchangeability , 1980 .

[3]  T. Bayes An essay towards solving a problem in the doctrine of chances , 2003 .

[4]  W. E. Johnson I.—PROBABILITY: THE DEDUCTIVE AND INDUCTIVE PROBLEMS , 1932 .

[5]  B. Efron,et al.  Did Shakespeare write a newly-discovered poem? , 1987 .

[6]  J. Kingman The Representation of Partition Structures , 1978 .

[7]  B. M. Hill,et al.  Zipf's Law and Prior Distributions for the Composition of a Population , 1970 .

[8]  M. The sampling theory of neutral alleles and an urn model in population genetics * , 2003 .

[9]  J. Kingman Random partitions in population genetics , 1978, Proceedings of the Royal Society of London. Series B. Biological Sciences.

[10]  I. Good THE POPULATION FREQUENCIES OF SPECIES AND THE ESTIMATION OF POPULATION PARAMETERS , 1953 .

[11]  D. Blackwell,et al.  Ferguson Distributions Via Polya Urn Schemes , 1973 .

[12]  Sandy L. Zabell,et al.  Symmetry and its discontents , 2005 .

[13]  Richard C. Jeffrey,et al.  Studies in inductive logic and probability , 1971 .

[14]  Sandy L. Zabell,et al.  The rule of succession , 1989 .

[15]  P. Donnelly,et al.  Partition structures, Polya urns, the Ewens sampling formula, and the ages of alleles. , 1986, Theoretical population biology.

[16]  P. McCullagh Estimating the Number of Unseen Species: How Many Words did Shakespeare Know? , 2008 .

[17]  W. Ewens The sampling theory of selectively neutral alleles. , 1972, Theoretical population biology.

[18]  J. Hintikka,et al.  An Axiomatic Foundation for the Logic of Inductive Generalization , 1976 .

[19]  D. Aldous Exchangeability and related topics , 1985 .

[20]  Irving John Good,et al.  The Estimation of Probabilities: An Essay on Modern Bayesian Methods , 1965 .

[21]  Bruce M. Hill,et al.  Parametric Models for AN: Splitting Processes and Mixtures , 1993 .

[22]  H. Jeffreys Logical Foundations of Probability , 1952, Nature.

[23]  Some estimates of the optimum inductive method , 1986 .

[24]  J. Kingman,et al.  Mathematics of genetic diversity , 1982 .

[25]  R. Fisher,et al.  The Relation Between the Number of Species and the Number of Individuals in a Random Sample of an Animal Population , 1943 .

[26]  F. Hoppe Pólya-like urns and the Ewens' sampling formula , 1984 .

[27]  C. Howson,et al.  Review: Richard Jeffrey, Studies in Inductive Logic and Probability; , A Basic System of Inductive Logic, Part II , 1984 .

[28]  C. Antoniak Mixtures of Dirichlet Processes with Applications to Bayesian Nonparametric Problems , 1974 .

[29]  Theo A. F. Kuipers A generalization of Carnap's inductive logic , 2004, Synthese.

[30]  Bruce M. Hill,et al.  Posterior Moments of the Number of Species in a Finite Population and the Posterior Probability of Finding a New Species , 1979 .

[31]  T. Rolski On random discrete distributions , 1980 .

[32]  P. Laplace,et al.  MÉMOIRE SUR LES PROBABILITÉS∗ , 2010 .