Genetic algorithm based new sequence of principal component regression (GA-NSPCR) for feature selection and yield prediction using hyperspectral remote sensing data

Recently, hyperspectral images are used to estimate the yield of food crops. The images consist of a large number of bands which requires sophisticated method for its analysis. One approach to reduce computational cost and to accelerate knowledge discovery is by eliminating bands that do not add value to the analysis. In this paper, a genetic algorithm based new sequence of principal component regression (GA-NSPCR) method is proposed and tested using 116 band HyMap airborne hyperspectral data and yield data collected from paddy fields. The proposed method uses GA to select an initial subset of hyperspectral bands, and subsequently generate a more accurate subset by measuring the minimum error of prediction model defined by principal component regression (PCR). Unlike standard PCR methods which order the features based on singular values, in each generation NSPCR orders the features based on squared multiple correlation coefficient R2. Yield data and spectral data are used to generate a separate training and testing dataset using 8 times bootstrap resampling (8-rounds BSR) to deal with limited number of samples in training data. Differed from standard GA impelementation, the fitness function evaluates three Lp-norms to obtain the best prediction model.

[1]  Lorenzo Bruzzone,et al.  Kernel-based methods for hyperspectral image classification , 2005, IEEE Transactions on Geoscience and Remote Sensing.

[2]  D. Borisova,et al.  Spectral Predictors of Crop Development and Yield , 2007, 2007 3rd International Conference on Recent Advances in Space Technologies.

[3]  Wang Ren-chao,et al.  Rice yield estimation using remote sensing and simulation model , 2002 .

[4]  Juan Carlos Orozco,et al.  Some p-norm convergence results forJacobi and Gauss-Seidel iterations , 2004 .

[5]  Cheng Wang,et al.  Modified Principal Component Analysis (MPCA) for feature selection of hyperspectral imagery , 2003, IGARSS 2003. 2003 IEEE International Geoscience and Remote Sensing Symposium. Proceedings (IEEE Cat. No.03CH37477).

[6]  Lei Tian,et al.  A genetic-algorithm-based selective principal component analysis (GA-SPCA) method for high-dimensional data feature extraction , 2003, IEEE Trans. Geosci. Remote. Sens..

[7]  David Casasent,et al.  Waveband selection for hyperspectral data: optimal feature selection , 2003, SPIE Defense + Commercial Sensing.

[8]  Xiaohua Yang,et al.  The estimation models of rape biomass yield using hyperspectral data , 2005, Proceedings. 2005 IEEE International Geoscience and Remote Sensing Symposium, 2005. IGARSS '05..

[9]  David Casasent,et al.  Hyperspectral data discrimination methods , 2000, SPIE Optics East.

[10]  V. Cristina Ivanescu,et al.  Bootstrapping to solve the limited data problem in production control: an application in batch process industries , 2006, J. Oper. Res. Soc..

[11]  Liu Ying,et al.  Hyperspectral Feature Extraction using Selective PCA based on Genetic Algorithm with Subgroups , 2006, First International Conference on Innovative Computing, Information and Control - Volume I (ICICIC'06).

[12]  Qingli Li,et al.  Band Selection for Biomedical Hyperspectral Data Studies Using Genetic Algorithms , 2009, 2009 3rd International Conference on Bioinformatics and Biomedical Engineering.

[13]  Peter Filzmoser,et al.  Introduction to Multivariate Statistical Analysis in Chemometrics , 2009 .