Genetic programming as a model induction engine

Present day instrumentation networks already provide immense quantities of data, very little of which provides any insights into the basic physical processes that are occurring in the measured medium. This is to say that the data by itself contributes little to the knowledge of such processes. Data mining and knowledge discovery aim to change this situation by providing technologies that will greatly facilitate the mining of data for knowledge. In this new setting the role of a human expert is to provide domain knowledge, interpret models suggested by the computer and devise further experiments that will provide even better data coverage. Clearly, there is an enormous amount of knowledge and understanding of physical processes that should not be just thrown away. Consequently, we strongly believe that the most appropriate way forward is to combine the best of the two approaches: theory-driven, understanding-rich with data-driven discovery process. This paper describes a particular knowledge discovery algorithm—Genetic Programming (GP). Additionally, an augmented version of GP—dimensionally aware GP—which is arguably more useful in the process of scientific discovery is described in great detail. Finally, the paper concludes with an application of dimensionally aware GP to a problem of induction of an empirical relationship describing the additional resistance to flow induced by flexible vegetation.

[1]  Nichael Lynn Cramer,et al.  A Representation for the Adaptive Generation of Simple Sequential Programs , 1985, ICGA.

[2]  M. Keijzer,et al.  Dimensionally aware genetic programming , 1999 .

[3]  David B. Fogel,et al.  Evolving Behaviors in the Iterated Prisoner's Dilemma , 1993, Evolutionary Computation.

[4]  Zbigniew Michalewicz,et al.  Genetic Algorithms + Data Structures = Evolution Programs , 1992, Artificial Intelligence.

[5]  Lawrence J. Fogel,et al.  Artificial Intelligence through Simulated Evolution , 1966 .

[6]  Vladan Babovic,et al.  Emergence, evolution, intelligence: hydroinformatics , 1996 .

[7]  H. H. Newman The Theory of Evolution , 1917, Botanical Gazette.

[8]  Peter J. Fleming,et al.  An Overview of Evolutionary Algorithms in Multiobjective Optimization , 1995, Evolutionary Computation.

[9]  Richard M. Friedberg,et al.  A Learning Machine: Part I , 1958, IBM J. Res. Dev..

[10]  Michael B. Abbott,et al.  THE ELECTRONIC ENCAPSULATION OF KNOWLEDGE IN HYDRAULICS, HYDROLOGY AND WATER RESOURCES , 1993 .

[11]  J. W. Davidson,et al.  Method for the identification of explicit polynomial formulae for the friction in turbulent pipe flow , 1999 .

[12]  John R. Koza,et al.  Genetic programming 2 - automatic discovery of reusable programs , 1994, Complex adaptive systems.

[13]  Padhraic Smyth,et al.  From Data Mining to Knowledge Discovery: An Overview , 1996, Advances in Knowledge Discovery and Data Mining.

[14]  David J. Montana,et al.  Strongly Typed Genetic Programming , 1995, Evolutionary Computation.

[15]  A. W. Minns,et al.  Hydrological Modelling in a Hydroinformatics Context , 1990 .

[16]  Brian Ellis,et al.  Basic Concepts of Measurement. , 1966 .

[17]  John J. Grefenstette,et al.  Proportional selection and sampling algorithms , 1997 .

[18]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[19]  V. L. Anderson Evolutionary Operation : A Method for Increasing Industrial Productivity , 1970 .

[20]  David B. Fogel,et al.  Evolving artificial intelligence , 1992 .

[21]  Patrick D. Surry,et al.  The Reproductive Plan Language RPL2: Motivation, Architecture and Applications , 1994 .

[22]  Ramón Margalef Perspectives in Ecological Theory , 1968 .

[23]  Vedrana Kutija,et al.  A numerical model for assessing the additional resistance to flow introduced by flexible vegetation , 1996 .

[24]  Walter Alden Tackett,et al.  Recombination, selection, and the genetic construction of computer programs , 1994 .

[25]  Brian Ellis,et al.  Basic Concepts of Measurement. , 1966 .

[26]  Torben Larsen,et al.  Discharge/Stage Relations in vegetated Danish Streams , 1990 .

[27]  Walter Böhm,et al.  Exact Uniform Initialization For Genetic Programming , 1996, FOGA.

[28]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .