Grasshopper Infestation Prediction: An Application of Data Mining to Ecological Modeling

This thesis presents a case study of applying machine learning tools to build a predictive model of annual infestations of grasshoppers in Eastern Oregon. The purpose of the study was two-fold. First, we wanted to develop a predictive model. Second, we wanted to explore the capabilities of existing machine learning tools and identify areas where further research was needed. The study succeeded in constructing a model with modest ability to predict future grasshopper infestations and provide advice to farmers about whether to treat their croplands with pesticides. Our analysis of the learned model shows that it should be able to provide useful advice if the ratio of treatment cost to crop damage cost is 1 to 1.67 or more. However, there is some evidence that the model is not able to make good predictions in years with extremely high levels of grasshopper infestation. To arrive at this successful model, three critical steps had to be taken. First, we had to properly formulate the prediction task both in terms of exactly what we were trying to predict (i.e., the probability of infestation) and the spatial area over which we could make predictions (i.e., areas within 6.77 kms radius of a weather station). Second, we had to define and extract a set of features that incorporated knowledge of the grasshopper life cycle. Third, we had to employ evaluation metrics that were able to measure small improvements in the quality of predictions. The study identified important directions for future research. In the area of grasshopper ecology, there is a need for improved data gathering tools including a much denser and more widespread network of weather stations. These stations should also measure subsoil temperatures. Recording of the dates of hatching of grasshopper nymphs would also be very valuable. In machine learning, methods are needed for automating the definition and extraction of features guided by qualitative knowledge of the domain, such as our qualitative knowledge of the grasshopper lifecycle.