A modified genetic algorithm and weighted principal component analysis based feature selection and extraction strategy in agriculture

Abstract Data pre-processing is a technique that transforms the raw data into a useful format for applying machine learning (ML) techniques. Feature selection (FS) and feature extraction (FeExt) form significant components of data pre-processing. FS is the identification of relevant features that enhances the accuracy of a model. Since, agricultural data contain diverse features related to climate, soil, fertilizer, FS attains significant importance as irrelevant features may adversely impact the prediction of the model built. Likewise, FeExt involves the derivation of new attributes from the prevailing attributes. All the information that the original attributes possess is present in these new features minus the duplicity. Keeping these points in mind, this work proposes a hybrid feature selection and feature extraction strategy for selecting features from the agricultural data set. A modified-Genetic Algorithm (m-GA) was developed by designing a fitness function based on “Mutual Information” (MutInf), and “Root Mean Square Error” (RtMSE) to choose the best features that affected the target attribute (crop yield in this case). These selected features were then subjected to feature extraction using “weighted principal component analysis” (wgt-PCA). The extracted features were then fed into different ML models viz. “Regression” (Reg), “Artificial Neural Networks” (ArtNN), “Adaptive Neuro Fuzzy Inference System” (ANFIS), “Ensemble of Trees” (EnT), and “Support Vector Regression” (SuVR). Trials on 3 benchmark and 8 real-world farming datasets revealed that the developed hybrid feature selection and extraction technique performed with significant improvements with respect to Rsq2, RtMSE, and “mean absolute error” (MAE) in comparison to FS and FeExt methods such as Correlation Analysis (CA), Singular Valued Decomposition (SiVD), Genetic Algorithm (GA), and wgt-PCA on “benchmark” and “real-world” farming datasets.

[1]  Ferat Sahin,et al.  A survey on feature selection methods , 2014, Comput. Electr. Eng..

[2]  Isabelle Guyon,et al.  An Introduction to Feature Extraction , 2006, Feature Extraction.

[3]  Claudia Eckert,et al.  Feature Selection and Extraction for Malware Classification , 2015, J. Inf. Sci. Eng..

[4]  R. D. S. Yadava,et al.  Boosting Principal Component Analysis by Genetic Algorithm , 2010 .

[5]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[6]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[7]  A. Bregt,et al.  Feature Selection as a Time and Cost-Saving Approach for Land Suitability Classification (Case Study of Shavur Plain, Iran) , 2016 .

[8]  K. K. Sahu,et al.  Normalization: A Preprocessing Stage , 2015, ArXiv.

[9]  A. G. Asuero,et al.  The Correlation Coefficient: An Overview , 2006 .

[10]  Laila Benhlima,et al.  Review on wrapper feature selection approaches , 2016, 2016 International Conference on Engineering & MIS (ICEMIS).

[11]  Urbano Nunes,et al.  Novel Maximum-Margin Training Algorithms for Supervised Neural Networks , 2010, IEEE Transactions on Neural Networks.

[12]  Zhigang Shang,et al.  Combined Feature Extraction and Selection in Texture Analysis , 2016, 2016 9th International Symposium on Computational Intelligence and Design (ISCID).

[13]  Sujit Rai,et al.  Wheat Crop Yield Prediction Using Deep LSTM Model , 2020, ArXiv.

[14]  George Athanasopoulos,et al.  Forecasting: principles and practice , 2013 .

[15]  R. Bhargavi,et al.  Selection of Important Features for Optimizing Crop Yield Prediction , 2019, Int. J. Agric. Environ. Inf. Syst..

[16]  Junita Mohamad-Saleh,et al.  Improved Neural Network Performance Using Principal Component Analysis on Matlab , 2008 .

[17]  Xu Qian,et al.  A Survey on Big Data Pre-processing , 2017, 2017 5th Intl Conf on Applied Computing and Information Technology/4th Intl Conf on Computational Science/Intelligence and Applied Informatics/2nd Intl Conf on Big Data, Cloud Computing, Data Science (ACIT-CSII-BCD).

[18]  Chuan-Yu Chang,et al.  A Hybrid CFS Filter and RF-RFE Wrapper-Based Feature Extraction for Enhanced Agricultural Crop Yield Prediction Modeling , 2020 .

[19]  V. Radha,et al.  A literature review of feature selection techniques and applications: Review of feature selection in data mining , 2014 .

[20]  Robert P. W. Duin,et al.  Combining Feature Subsets in Feature Selection , 2005, Multiple Classifier Systems.

[21]  Zizhu Fan,et al.  Weighted Principal Component Analysis , 2011, AICI.

[22]  Christian Bauckhage,et al.  Data Mining and Pattern Recognition in Agriculture , 2013, KI - Künstliche Intelligenz.

[23]  Rachana Mehta,et al.  An empirical analysis on SVD based recommendation techniques , 2017, 2017 Innovations in Power and Advanced Computing Technologies (i-PACT).

[24]  Marco Vannucci,et al.  A Hybrid Feature Selection Method for Classification Purposes , 2014, 2014 European Modelling Symposium.

[25]  Richard D. Braatz,et al.  Principal Component Analysis of Process Datasets with Missing Values , 2017 .

[26]  Serkan Gunal Hybrid feature selection for text classification , 2012 .

[27]  Qiang Xu,et al.  Feature Selection: Filter Methods Performance Challenges , 2019, 2019 International Conference on Computer and Information Sciences (ICCIS).

[28]  Ayalew Kassahun,et al.  Crop yield prediction using machine learning: A systematic literature review , 2020, Comput. Electron. Agric..

[29]  Lizhi Wang,et al.  Crop Yield Prediction Using Deep Neural Networks , 2019, Front. Plant Sci..

[30]  Hardik H. Maheta,et al.  A comparative study of various feature selection techniques in high-dimensional data set to improve classification accuracy , 2015, 2015 International Conference on Computer Communication and Informatics (ICCCI).

[31]  Zhen Ji,et al.  Feature extraction and selection hybrid algorithm for hyperspectral imagery classification , 2010, 2010 IEEE International Geoscience and Remote Sensing Symposium.

[32]  Fabio Massimo Zanzotto,et al.  Singular Value Decomposition for Feature Selection in Taxonomy Learning , 2009, RANLP.