Feature selection methods for characterizing and classifying adaptive Sustainable Flood Retention Basins.

The European Union's Flood Directive 2007/60/EC requires member states to produce flood risk maps for all river basins and coastal areas at risk of flooding by 2013. As a result, flood risk assessments have become an urgent challenge requiring a range of rapid and effective tools and approaches. The Sustainable Flood Retention Basin (SFRB) concept has evolved to provide a rapid assessment technique for impoundments, which have a pre-defined or potential role in flood defense and diffuse pollution control. A previous version of the SFRB survey method developed by the co-author Scholz in 2006 recommends gathering of over 40 variables to characterize an SFRB. Collecting all these variables is relatively time-consuming and more importantly, these variables are often correlated with each other. Therefore, the objective is to explore the correlation among these variables and find the most important variables to represent an SFRB. Three feature selection techniques (Information Gain, Mutual Information and Relief) were applied on the SFRB data set to identify the importance of the variables in terms of classification accuracy. Four benchmark classifiers (Support Vector Machine, K-Nearest Neighbours, C4.5 Decision Tree and Naïve Bayes) were subsequently used to verify the effectiveness of the classification with the selected variables and automatically identify the optimal number of variables. Experimental results indicate that our proposed approach provides a simple, rapid and effective framework for variable selection and SFRB classification. Only nine important variables are sufficient to accurately classify SFRB. Finally, six typical cases were studied to verify the performance of the identified nine variables on different SFRB types. The findings provide a rapid scientific tool for SFRB assessment in practice. Moreover, the generic value of this tool allows also for its wide application in other areas.

[1]  Jack Perkins,et al.  Pattern recognition in practice , 1980 .

[2]  Francesc J. Ferri,et al.  Comparative study of techniques for large-scale feature selection* *This work was suported by a SERC grant GR/E 97549. The first author was also supported by a FPI grant from the Spanish MEC, PF92 73546684 , 1994 .

[3]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[4]  Miklas Scholz,et al.  Classification methodology for Sustainable Flood Retention Basins , 2007 .

[5]  Pedro Larrañaga,et al.  A review of feature selection techniques in bioinformatics , 2007, Bioinform..

[6]  David W. Aha,et al.  Instance-Based Learning Algorithms , 1991, Machine Learning.

[7]  Miklas Scholz,et al.  Conceptual classification model for Sustainable Flood Retention Basins. , 2009, Journal of environmental management.

[8]  Huan Liu,et al.  A Probabilistic Approach to Feature Selection - A Filter Solution , 1996, ICML.

[9]  Igor Kononenko,et al.  Estimating Attributes: Analysis and Extensions of RELIEF , 1994, ECML.

[10]  Shih-Fu Chang,et al.  Image Retrieval: Current Techniques, Promising Directions, and Open Issues , 1999, J. Vis. Commun. Image Represent..

[11]  Miklas Scholz,et al.  Wetland Systems to Control Urban Runoff , 2006 .

[12]  Geoff Holmes,et al.  Benchmarking Attribute Selection Techniques for Discrete Class Data Mining , 2003, IEEE Trans. Knowl. Data Eng..

[13]  Montserrat Carbonell,et al.  The Ramsar Convention manual : a guide to the Convention on wetlands (Ramsar, Iran, 1971) , 1997 .

[14]  Huan Liu,et al.  Feature Selection for Classification , 1997, Intell. Data Anal..

[15]  George Forman,et al.  An Extensive Empirical Study of Feature Selection Metrics for Text Classification , 2003, J. Mach. Learn. Res..

[16]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  M. Stone Cross‐Validatory Choice and Assessment of Statistical Predictions , 1976 .

[18]  Miklas Scholz,et al.  Classification and assessment of water bodies as adaptive structural measures for flood risk management planning. , 2010, Journal of environmental management.

[19]  J. Borak Feature selection and land cover classification of a MODIS-like data set for a semiarid environment , 1999 .

[20]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[21]  Mark S. Nixon,et al.  Gait Feature Subset Selection by Mutual Information , 2007, 2007 First IEEE International Conference on Biometrics: Theory, Applications, and Systems.

[22]  Chih-Jen Lin,et al.  Combining SVMs with Various Feature Selection Strategies , 2006, Feature Extraction.

[23]  Jacek M. Zurada,et al.  Normalized Mutual Information Feature Selection , 2009, IEEE Transactions on Neural Networks.

[24]  Larry A. Rendell,et al.  A Practical Approach to Feature Selection , 1992, ML.

[25]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[26]  L. A. Smith,et al.  Feature Subset Selection: A Correlation Based Filter Approach , 1997, ICONIP.

[27]  Thomas G. Dietterich,et al.  Learning with Many Irrelevant Features , 1991, AAAI.

[28]  Huan Liu,et al.  Toward integrating feature selection algorithms for classification and clustering , 2005, IEEE Transactions on Knowledge and Data Engineering.

[29]  Miklas Scholz,et al.  Guidance on variables characterising water bodies including sustainable flood retention basins. , 2010 .

[30]  Irina Rish,et al.  An empirical study of the naive Bayes classifier , 2001 .