Predicting Shellfish Farm Closures with Class Balancing Methods

Real-time environmental monitoring can provide vital situational awareness for effective management of natural resources. Effective operation of Shellfish farms depends on environmental conditions. In this paper we propose a supervised learning approach to predict the farm closures. This is a binary classification problem where farm closure is a function of environmental variables. A problem with this classification approach is that farm closure events occur with small frequency leading to class imbalance problem. Straightforward learning techniques tend to favour the majority class; in this case continually predicting no event. We present a new ensemble class balancing algorithm based on random undersampling to resolve this problem. Experimental results show that the class balancing ensemble performs better than individual and other state of art ensemble classifiers. We have also obtained an understanding of the importance of relevant environmental variables for shellfish farm closure. We have utilized feature ranking algorithms in this regard.

[1]  Daniel Hernández-Lobato,et al.  An Analysis of Ensemble Pruning Techniques Based on Ordered Aggregation , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[3]  R. Schapire The Strength of Weak Learnability , 1990, Machine Learning.

[4]  Mohamed S. Kamel,et al.  A generalized adaptive ensemble generation and aggregation approach for multiple classifier systems , 2009, Pattern Recognit..

[5]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[6]  Harris Drucker,et al.  Boosting and Other Ensemble Methods , 1994, Neural Computation.

[7]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[8]  Ashfaqur Rahman,et al.  Novel Layered Clustering-Based Approach for Generating Ensemble of Classifiers , 2011, IEEE Transactions on Neural Networks.

[9]  Nitin Muttil,et al.  Machine-learning paradigms for selecting ecologically significant input variables , 2007, Eng. Appl. Artif. Intell..

[10]  Nicolás García-Pedrajas,et al.  Constructing Ensembles of Classifiers by Means of Weighted Instance Selection , 2009, IEEE Transactions on Neural Networks.

[11]  Loris Nanni,et al.  FuzzyBagging: A novel ensemble of classifiers , 2006, Pattern Recognit..

[12]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[13]  Lawrence O. Hall,et al.  Soft partitions lead to better learned ensembles , 2002, 2002 Annual Meeting of the North American Fuzzy Information Processing Society Proceedings. NAFIPS-FLINT 2002 (Cat. No. 02TH8622).

[14]  Leo Breiman,et al.  Pasting Small Votes for Classification in Large Databases and On-Line , 1999, Machine Learning.

[15]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.