Determining the Most Important Physiological and Agronomic Traits Contributing to Maize Grain Yield through Machine Learning Algorithms: A New Avenue in Intelligent Agriculture

Prediction is an attempt to accurately forecast the outcome of a specific situation while using input information obtained from a set of variables that potentially describe the situation. They can be used to project physiological and agronomic processes; regarding this fact, agronomic traits such as yield can be affected by a large number of variables. In this study, we analyzed a large number of physiological and agronomic traits by screening, clustering, and decision tree models to select the most relevant factors for the prospect of accurately increasing maize grain yield. Decision tree models (with nearly the same performance evaluation) were the most useful tools in understanding the underlying relationships in physiological and agronomic features for selecting the most important and relevant traits (sowing date-location, kernel number per ear, maximum water content, kernel weight, and season duration) corresponding to the maize grain yield. In particular, decision tree generated by C&RT algorithm was the best model for yield prediction based on physiological and agronomical traits which can be extensively employed in future breeding programs. No significant differences in the decision tree models were found when feature selection filtering on data were used, but positive feature selection effect observed in clustering models. Finally, the results showed that the proposed model techniques are useful tools for crop physiologists to search through large datasets seeking patterns for the physiological and agronomic factors, and may assist the selection of the most important traits for the individual site and field. In particular, decision tree models are method of choice with the capability of illustrating different pathways of yield increase in breeding programs, governed by their hierarchy structure of feature ranking as well as pattern discovery via various combinations of features.

[1]  Y. Emam,et al.  SOURCE-SINK MANIPULATION EFFECTS ON MAIZE KERNEL QUALITY , 2013 .

[2]  Y. Emam,et al.  Effect of partial defoliation after silking stage on yield components of three grain maize hybrids under semi-arid conditions , 2012 .

[3]  Mansour Ebrahimi,et al.  Prediction of Thermostability from Amino Acid Attributes by Combination of Clustering with Attribute Weighting: A New Vista in Engineering Enzymes , 2011, PloS one.

[4]  R. Fischer Wheat physiology: a review of recent developments , 2011 .

[5]  Abdullah M. Al Ghoson Decision Tree Induction & Clustering Techniques In SAS Enterprise Miner, SPSS Clementine, And IBM Intelligent Miner A Comparative Analysis , 2011, BIOINFORMATICS 2011.

[6]  Marzieh Ebrahimi,et al.  Amino Acid Features of P1B-ATPase Heavy Metal Transporters Enabling Small Numbers of Organisms to Cope with Heavy Metal Pollution , 2011, Bioinformatics and biology insights.

[7]  M. Ebrahimi,et al.  Sequence-Based Prediction of Enzyme Thermostability Through Bioinformatics Algorithms , 2010 .

[8]  L. Borrás,et al.  Trait dissection of maize kernel weight: Towards integrating hierarchical scales using a plant growth approach , 2010 .

[9]  Andrew P. Beatty,et al.  Assessment across the United States of the Benefits of Altered Soybean Drought Traits , 2010 .

[10]  Andreas Maunz,et al.  Development of decision tree models for substrates, inhibitors, and inducers of p-glycoprotein. , 2009, Current drug metabolism.

[11]  Gerhard F. Ecker,et al.  Similarity-based SIBAR descriptors for classification of chemically diverse hERG blockers , 2009, Molecular Diversity.

[12]  X. Ye,et al.  A computerized system for signal detection in spontaneous reporting system of Shanghai China , 2009, Pharmacoepidemiology and drug safety.

[13]  R. Melchiori,et al.  Maize kernel growth and kernel water relations as affected by nitrogen supply , 2008 .

[14]  Yang Wang,et al.  Feature‐selection ability of the decision‐tree algorithm and the impact of feature‐selection/extraction on decision‐tree results based on hyperspectral data , 2008, International Journal of Remote Sensing.

[15]  L. Borrás,et al.  Kernel weight dependence upon plant growth at different grain-filling stages in maize and sorghum , 2008 .

[16]  M. Michael Gromiha,et al.  Functional discrimination of membrane proteins using machine learning techniques , 2008, BMC Bioinformatics.

[17]  D. Taniar,et al.  Book Review: Computational Methods of Feature Selection , 2007, IEEE Intell. Informatics Bull..

[18]  David McLean,et al.  Logistic Model Tree Extraction From Artificial Neural Networks , 2007, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[19]  M. Westgate,et al.  Source/sink ratio and the relationship between maximum water content, maximum volume, and final dry weight of maize kernels , 2007 .

[20]  L. Borrás,et al.  Kernel water relations and duration of grain filling in maize temperate hybrids , 2007 .

[21]  David W. Franzen,et al.  Neural Network Optimisation of Remotely Sensed Maize Leaf Nitrogen with a Genetic Algorithm and Linear Programming using Five Performance Parameters , 2006 .

[22]  M. Westgate,et al.  Predicting maize kernel sink capacity early in development , 2006 .

[23]  L. Borrás,et al.  Source–sink relations and kernel weight differences in maize temperate hybrids , 2006 .

[24]  B. Ma,et al.  Ear Position, Leaf Area, and Contribution of Individual Leaves to Grain Yield in Conventional and Leafy Maize Hybrids , 2005 .

[25]  F H Schulze,et al.  Applications of Artificial Neural Networks in integrated water management: fiction or future? , 2005, Water science and technology : a journal of the International Association on Water Pollution Research.

[26]  D. Jurkovic,et al.  Expectant management of tubal ectopic pregnancy: prediction of successful outcome using decision tree analysis , 2004, Ultrasound in obstetrics & gynecology : the official journal of the International Society of Ultrasound in Obstetrics and Gynecology.

[27]  M. Westgate,et al.  Control of kernel weight and kernel water relations by post-flowering source-sink ratio in maize. , 2003, Annals of botany.

[28]  Peter R. Thomison,et al.  Delayed Planting Effects on Flowering and Grain Maturation of Dent Corn , 2002 .

[29]  John F. Roddick,et al.  An Updated Bibliography of Temporal, Spatial, and Spatio-temporal Data Mining Research , 2000, TSDM.

[30]  Joseph G. Lauer,et al.  Corn Hybrid Response to Planting Date in the Northern Corn Belt , 1999 .

[31]  Kazunori Matsumoto,et al.  An Experimental Agricultural Data Mining System , 1998, Discovery Science.

[32]  M. Maheswari,et al.  Effect of altering source availability on expression of sink capacity in a maize hybrid and its parents , 1998 .

[33]  M. Ebrahimi,et al.  Application of supervised feature selection methods to define the most important traits affecting maximum kernel water content in maize , 2011 .

[34]  Hao,et al.  Spatial Data Mining of Colocation Patterns for Decision Support in Agriculture , 2007 .

[35]  Vimal Singh,et al.  IEEE transactions on systems, man and cybernetics. Part B, Cybernetics , 1996 .

[36]  P. Goldsworthy,et al.  The Physiology of tropical field crops , 1984 .

[37]  R. Jones,et al.  Effect of Altered Source‐Sink Ratio on Growth of Maize Kernels 1 , 1983 .