Integrating Genetic Algorithms and Text Learning for Financial Prediction

This paper takes two approaches to prediction of nancial markets using text data downloaded from web bulletin boards. The rst uses maximum entropy text classi cation to predict based on the whole body of text; the second uses a genetic algorithm to learn simple rules based solely on numerical data of trading volume, number of messages posted per day and total number of words posted per day. While both approaches produce positive excess returns in some cases, it is found that integrating the two predictors together produces far superior results. Furthermore, aggregating multiple GA trials to build single predictors increases performance even more.