Discovering Interesting Patterns for Investment Decision Making with GLOWER ☹—A Genetic Learner Overlaid with Entropy Reduction

Prediction in financial domains is notoriously difficult for a number of reasons. First, theories tend to be weak or non-existent, which makes problem formulation open ended by forcing us to consider a large number of independent variables and thereby increasing the dimensionality of the search space. Second, the weak relationships among variables tend to be nonlinear, and may hold only in limited areas of the search space. Third, in financial practice, where analysts conduct extensive manual analysis of historically well performing indicators, a key is to find the hidden interactions among variables that perform well in combination. Unfortunately, these are exactly the patterns that the greedy search biases incorporated by many standard rule learning algorithms will miss. In this paper, we describe and evaluate several variations of a new genetic learning algorithm (GLOWER) on a variety of data sets. The design of GLOWER has been motivated by financial prediction problems, but incorporates successful ideas from tree induction and rule learning. We examine the performance of several GLOWER variants on two UCI data sets as well as on a standard financial prediction problem (S&P500 stock returns), using the results to identify one of the better variants for further comparisons. We introduce a new (to KDD) financial prediction problem (predicting positive and negative earnings surprises), and experiment with GLOWER, contrasting it with tree- and rule-induction approaches. Our results are encouraging, showing that GLOWER has the ability to uncover effective patterns for difficult problems that have weak structure and significant nonlinearities.

[1]  Richard J. Bauer,et al.  Genetic Algorithms and Investment Strategies , 1994 .

[2]  Jukka Hekanaho Background Knowledge in GA-based Concept Learning , 1996, ICML.

[3]  Kalyanmoy Deb,et al.  Massive Multimodality, Deception, and Genetic Algorithms , 1992, PPSN.

[4]  Kalyanmoy Deb,et al.  An Investigation of Niche and Species Formation in Genetic Function Optimization , 1989, ICGA.

[5]  Hugh M. Cartwright,et al.  Looking Around: Using Clues from the Data Space to Guide Genetic Algorithm Searches , 1991, ICGA.

[6]  Sreerama K. Murthy,et al.  Automatic Construction of Decision Trees from Data: A Multi-Disciplinary Survey , 1998, Data Mining and Knowledge Discovery.

[7]  Foster J. Provost,et al.  RL4: a tool for knowledge-based induction , 1990, [1990] Proceedings of the 2nd International IEEE Conference on Tools for Artificial Intelligence.

[8]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[9]  Vasant Dhar,et al.  Seven Methods for Transforming Corporate Data Into Business Intelligence , 1996 .

[10]  JOHANNES FÜRNKRANZ,et al.  Separate-and-Conquer Rule Learning , 1999, Artificial Intelligence Review.

[11]  Padhraic Smyth,et al.  Rule Induction Using Information Theory , 1991, Knowledge Discovery in Databases.

[12]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[13]  D. E. Goldberg,et al.  Genetic Algorithms in Search , 1989 .

[14]  Pedro M. Domingos Unifying Instance-Based and Rule-Based Induction , 1996, Machine Learning.

[15]  Vasant Dhar,et al.  The relationship between earnings events and returns: a comparison of four nonlinear prediction models , 1999 .

[16]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[17]  Michael J. Shaw,et al.  Genetic algorithms with dynamic niche sharing for multimodal function optimization , 1996, Proceedings of IEEE International Conference on Evolutionary Computation.

[18]  Pedro M. Domingos Linear-Time Rule Induction , 1996, KDD.

[19]  Samir W. Mahfoud A Comparison of Parallel and Sequential Niching Methods , 1995, ICGA.

[20]  Bartley J. Madden The CFROI Life Cycle , 1996 .

[21]  Paul R. Cohen,et al.  Multiple Comparisons in Induction Algorithms , 2000, Machine Learning.

[22]  Peter Clark,et al.  The CN2 Induction Algorithm , 1989, Machine Learning.

[23]  David J. Spiegelhalter,et al.  Machine Learning, Neural and Statistical Classification , 2009 .

[24]  Wei-Yin Loh,et al.  A Comparison of Prediction Accuracy, Complexity, and Training Time of Thirty-Three Old and New Classification Algorithms , 2000, Machine Learning.

[25]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[26]  Kenneth A. De Jong,et al.  Evolutionary Computation for Discovery , 1999, Commun. ACM.

[27]  Michael J. Shaw,et al.  A Double-Layered Learning Approach to Acquiring Rules for Classification: Integrating Genetic Algorithms with Similarity-Based Learning , 1994, INFORMS J. Comput..

[28]  Shinichi Morishita,et al.  On Classification and Regression , 1998, Discovery Science.

[29]  Foster J. Provost,et al.  Inductive policy: The pragmatics of bias selection , 1995, Machine Learning.

[30]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[31]  Nada Lavrac,et al.  The Multi-Purpose Incremental Learning System AQ15 and Its Testing Application to Three Medical Domains , 1986, AAAI.

[32]  Charles L. Forgy,et al.  Rete: A Fast Algorithm for the Many Patterns/Many Objects Match Problem , 1982, Artif. Intell..

[33]  C. Janikow A Knowledge-Intensive Genetic Algorithm for Supervised Learning , 2004, Machine Learning.

[34]  Yoram Singer,et al.  A simple, fast, and effective rule learner , 1999, AAAI 1999.

[35]  Foster J. Provost,et al.  Inductive Strengthening: the Effects of a Simple Heuristic for Restricting Hypothesis Space Search , 1992, AII.

[36]  J. Hong,et al.  Incremental Discovery of Rules and Structure by Hierarchical and Parallel Clustering , 1991, Knowledge Discovery in Databases.

[37]  Samir W. Mahfoud Niching methods for genetic algorithms , 1996 .

[38]  Ralph R. Martin,et al.  A Sequential Niche Technique for Multimodal Function Optimization , 1993, Evolutionary Computation.