News and trading rules

AI has long been applied to the problem of predicting financial markets. While AI researchers see financial forecasting as a fascinating challenge, predicting markets has powerful implications for financial economics—in particular the study of market efficiency. Recently economists have turned to AI for tools, using genetic algorithms to build trading strategies, and exploring the returns those strategies generate of evidence of market inefficiency. The primary aim of this thesis is to take this basic approach, and put the artificial intelligence techniques used on a firm footing, in two ways: first, by adapting AI techniques to the stunning amount of noise in financial data; second, by introducing a new source of data untapped by traditional forecasting methods: news. I start with practitioner-developed technical analysis constructs, systematically examining their ability to generate trading rules profitable on a large universe of stocks. Then, I use these technical analysis constructs as the underlying representation for a simple trading rule leaner, with close attention paid to limiting search and representation to fight over-fitting. In addition, I explore the use of ensemble methods to improve performance. Finally, I introduce the use of textual data from internet message boards and news stories, studying their use both in isolation as well as augmenting numerical trading strategies.

[1]  Yi-Cheng Zhang,et al.  Emergence of cooperation and organization in an evolutionary game , 1997 .

[2]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[3]  A. Lo Finance: A Selective Survey , 2000 .

[4]  C. Cleverdon On the Inverse Relationship of Recall and Precision. , 1972 .

[5]  L. Harris A transaction data study of weekly and intradaily patterns in stock returns , 1986 .

[6]  Annette B. Poulsen,et al.  The Returns to Acquiring Firms in Tender Offers: Evidence from Three Decades , 1989 .

[7]  James Allan,et al.  Language models for financial news recommendation , 2000, CIKM '00.

[8]  J. C. R. Hunt Searching for certainty , 1995, Nature.

[9]  H. White,et al.  A Reality Check for Data Snooping , 2000 .

[10]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[11]  David A. Hsieh,et al.  The Risk in Hedge Fund Strategies : Alternative Alphas and Alternative Betas , 2003 .

[12]  Neil F. Johnson,et al.  Application of multi-agent games to the prediction of financial time-series , 2001 .

[13]  Michael E. Lesk,et al.  Relevance assessments and retrieval system evaluation , 1968, Inf. Storage Retr..

[14]  MladenicDunja Text-Learning and Related Intelligent Agents , 1999 .

[15]  A Paul,et al.  SAMUELSON, . Proof that properly anticipated prices fluctuate randomly, Industrial Management Review, . , 1965 .

[16]  Paul W. Munro,et al.  Improving Committee Diagnosis with Resampling Techniques , 1995, NIPS.

[17]  S C Kleene,et al.  Representation of Events in Nerve Nets and Finite Automata , 1951 .

[18]  John R. Koza,et al.  Genetic programming - on the programming of computers by means of natural selection , 1993, Complex adaptive systems.

[19]  John H. Holland,et al.  Genetic Algorithms and the Optimal Allocation of Trials , 1973, SIAM J. Comput..

[20]  Tom Fawcett,et al.  Activity monitoring: noticing interesting changes in behavior , 1999, KDD '99.

[21]  Peter D. Wysocki Cheap Talk on the Web: The Determinants of Postings on Stock Message Boards , 1998 .

[22]  Yicheng Zhang,et al.  On the minority game: Analytical and numerical studies , 1998, cond-mat/9805084.

[23]  J.T. Alander,et al.  On optimal population size of genetic algorithms , 1992, CompEuro 1992 Proceedings Computer Systems and Software Engineering.

[24]  Thomas G. Dietterich Multiple Classifier Systems , 2000, Lecture Notes in Computer Science.

[25]  Nathan Intrator,et al.  Bootstrapping with Noise: An Effective Regularization Technique , 1996, Connect. Sci..

[26]  Sanford J. Grossman On the Impossibility of Informationally Efficient Markets , 1980 .

[27]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[28]  D. Duffie Dynamic Asset Pricing Theory , 1992 .

[29]  E. Fama EFFICIENT CAPITAL MARKETS: A REVIEW OF THEORY AND EMPIRICAL WORK* , 1970 .

[30]  H. P. Schwefel,et al.  Numerische Optimierung von Computermodellen mittels der Evo-lutionsstrategie , 1977 .

[31]  Franklin Allen,et al.  Using genetic algorithms to find technical trading rules , 1999 .

[32]  B. LeBaron,et al.  Simple Technical Trading Rules and the Stochastic Properties of Stock Returns , 1992 .

[33]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[34]  Gerard Salton,et al.  Automatic text analysis , 1970, J. Am. Soc. Inf. Sci..

[35]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[36]  Colin R. Reeves,et al.  Using Genetic Algorithms with Small Populations , 1993, ICGA.

[37]  J. Wilder New Concepts in Technical Trading Systems , 1978 .

[38]  E. Fama,et al.  Efficient Capital Markets : II , 2007 .

[39]  Yiming Yang,et al.  A Comparative Study on Feature Selection in Text Categorization , 1997, ICML.

[40]  Ellen Riloff,et al.  Information extraction as a basis for high-precision text classification , 1994, TOIS.

[41]  C. Osler,et al.  Support for Resistance: Technical Analysis and Intraday Exchange Rates , 2000 .

[42]  Christopher J. Neely,et al.  Technical Analysis and Central Bank Intervention , 2000 .

[43]  Christos Faloutsos,et al.  A survey of information retrieval and filtering methods , 1995 .

[44]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[45]  Lothar Thiele,et al.  A Comparison of Selection Schemes Used in Evolutionary Algorithms , 1996, Evolutionary Computation.

[46]  David D. Lewis,et al.  A comparison of two learning algorithms for text categorization , 1994 .

[47]  Robert E. Schapire,et al.  A Brief Introduction to Boosting , 1999, IJCAI.

[48]  Thomas Hellström,et al.  Optimizing the Sharpe Ratio for a Rank Based Trading System , 2001, EPIA.

[49]  Douglas B. Lenat,et al.  CYC: a large-scale investment in knowledge infrastructure , 1995, CACM.

[50]  B. LeBaron Technical Trading Rule Profitability and Foreign Exchange Intervention , 1996 .

[51]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[52]  Blake LeBaron,et al.  An Evolutionary Bootstrap Method for Selecting Dynamic Trading Strategies , 1998 .

[53]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[54]  Jeffrey E. F. Friedl Mastering Regular Expressions , 1997 .

[55]  Yann LeCun,et al.  Optimal Brain Damage , 1989, NIPS.

[56]  Tom Fawcett,et al.  Adaptive Fraud Detection , 1997, Data Mining and Knowledge Discovery.

[57]  H. White,et al.  Data‐Snooping, Technical Trading Rule Performance, and the Bootstrap , 1999 .

[58]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[59]  Thomas Bck,et al.  Evolutionary computation: Toward a new philosophy of machine intelligence , 1997, Complex..

[60]  C. Goodhart,et al.  High frequency data in financial markets: Issues and applications , 1997 .

[61]  Yiming Yang,et al.  An Evaluation of Statistical Approaches to Text Categorization , 1999, Information Retrieval.

[62]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[63]  Christopher J. Neely,et al.  Is Technical Analysis in the Foreign Exchange Market Profitable? A Genetic Programming Approach , 1996, Journal of Financial and Quantitative Analysis.

[64]  David E. Goldberg,et al.  Genetic Algorithms, Selection Schemes, and the Varying Effects of Noise , 1996, Evolutionary Computation.

[65]  Thomas Hellström Predicting a Rank Measure for Stock Returns , 2000 .

[66]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[67]  P. M. Hui,et al.  From market games to real-world markets , 2001 .

[68]  Maurice B. Line,et al.  PROGRESS IN DOCUMENTATION: ‘obsolescence’ and changes in the use of literature with time , 1974 .

[69]  Ramanathan V. Guha,et al.  Building large knowledge-based systems , 1989 .

[70]  Dunja Mladenic,et al.  Text-learning and related intelligent agents: a survey , 1999, IEEE Intell. Syst..

[71]  Herbert Coblans,et al.  Progress in Documentation. , 1972 .

[72]  SaltonGerard,et al.  Term-weighting approaches in automatic text retrieval , 1988 .

[73]  Thomas G. Dietterich An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization , 2000, Machine Learning.

[74]  Jian Zhang,et al.  Daily Prediction of Major Stock Indices from Textual WWW Data , 1998, KDD.

[75]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[76]  Andrew McCallum,et al.  Using Maximum Entropy for Text Classification , 1999 .

[77]  R. Schapire The Strength of Weak Learnability , 1990, Machine Learning.

[78]  C. Osler,et al.  Head and Shoulders: Not Just a Flaky Pattern , 1995 .

[79]  Yiming Yang,et al.  Expert network: effective and efficient learning from human decisions in text categorization and retrieval , 1994, SIGIR '94.