Learn More about Your Data: A Symbolic Regression Knowledge Representation Framework

In this paper, we propose a flexible knowledge representation framework which utilizes Symbolic Regression to learn and mathematical expressions to represent the knowledge to be captured from data. In this approach, learning algorithms are used to generate new insights which can be added to domain knowledge bases supporting again symbolic regression. This is used for the generalization of the well-known regression analysis to fulfill supervised classification. The approach aims to produce a learning model which best separates the class members of a labeled training set. The class boundaries are given by a separation surface which is represented by the level set of a model function. The separation boundary is defined by the respective equation. In our symbolic approach, the learned knowledge model is represented by mathematical formulas and it is composed of an optimum set of expressions of a given superset. We show that this property gives human experts options to gain additional insights into the application domain. Furthermore, the representation in terms of mathematical formulas (e.g., the analytical model and its first and second derivative) adds additional value to the classifier and enables to answer questions, which sub-symbolic classifier approaches cannot. The symbolic representation of the models enables an interpretation by human experts. Existing and previously known expert knowledge can be added to the developed knowledge representation framework or it can be used as constraints. Additionally, the knowledge acquisition framework can be repeated several times. In each step, new insights from the search process can be added to the knowledge base to improve the overall performance of the proposed learning algorithms.

[1]  John R. Koza,et al.  Genetic programming - on the programming of computers by means of natural selection , 1993, Complex adaptive systems.

[2]  W. Pitts,et al.  A Logical Calculus of the Ideas Immanent in Nervous Activity (1943) , 2021, Ideas That Created the Future.

[3]  D. Pregibon,et al.  Graphical Methods for Assessing Logistic Regression Models , 1984 .

[4]  David G. Stork,et al.  Pattern classification, 2nd Edition , 2000 .

[5]  P. Smolensky On the proper treatment of connectionism , 1988, Behavioral and Brain Sciences.

[6]  Rudy Setiono A Neural Network Construction Algorithm which Maximizes the Likelihood Function , 1995, Connect. Sci..

[7]  Daniel F. Bossut,et al.  IMPLICATION OF NEURAL NETWORKS FOR HOW WE THINK ABOUT BRAIN FUNCTION , 1992 .

[8]  S. Utyuzhnikov,et al.  Directed search domain: a method for even generation of the Pareto frontier in multiobjective optimization , 2011 .

[9]  R. S. Laundy,et al.  Multiple Criteria Optimisation: Theory, Computation and Application , 1989 .

[10]  Hod Lipson,et al.  Discovering a domain alphabet , 2009, GECCO.

[11]  Arthur K. Kordon,et al.  Industrial Strength Genetic Programming , 2003 .

[12]  Mark Kotanchek,et al.  Pareto-Front Exploitation in Symbolic Regression , 2005 .

[13]  D. Pregibon,et al.  Graphical Methods for Assessing Logistic Regression Models: Rejoinder , 1984 .

[14]  David G. Stork,et al.  Pattern Classification , 1973 .

[15]  David A. Freedman,et al.  Statistical Models: Theory and Practice: References , 2005 .

[16]  Michael O'Neill,et al.  Grammatical evolution - evolutionary automatic programming in an arbitrary language , 2003, Genetic programming.

[17]  D. Robinson Movement control: Implications of neural networks for how we think about brain function , 1992 .

[18]  K. Lang,et al.  Learning to tell two spirals apart , 1988 .

[19]  John H. Holland,et al.  Induction: Processes of Inference, Learning, and Discovery , 1987, IEEE Expert.

[20]  Lalit M. Patnaik,et al.  Application of genetic programming for multicategory pattern classification , 2000, IEEE Trans. Evol. Comput..

[21]  Norbert Link,et al.  Reusable Knowledge from Symbolic Regression Classification , 2011, 2011 Fifth International Conference on Genetic and Evolutionary Computing.

[22]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.