Mutation-based compact genetic algorithm for spectroscopy variable selection in determining protein concentration in wheat grain

Wheat is the third most produced grain in the world after maize and rice. Determining the protein concentration in wheat grain is one of the major challenges for measuring its industrial quality. Samples of wheat can be collected using a spectrophotometer device. The challenge is to associate the energy absorbed by the device with the protein concentration in wheat. The device measures hundreds of variable intensities that can be related to the physicochemical properties. The selection of a subset of uncorrelated variables has been shown to be fundamental for establishing correct correlations and reducing prediction error. A new formulation of a compact genetic algorithm that uses only a mutation operator is proposed. The results produced by the proposed approach are compared with traditional techniques for spectroscopy variable selection as successive projection algorithms, partial least square and classical formulations of genetic algorithms. For near-infrared spectral analysis of the protein concentration in wheat, the prediction errors decreased from 0.28 to 0.10 on average, a reduction of 63%.