Discovering weather periods and crop properties favorable for coffee rust incidence from feature selection approaches

Abstract Coffee Leaf Rust (CLR) is a disease that leads to considerable losses in the worldwide coffee industry; as those that have been reported recently in Colombia and Central America. The early detection of favorable conditions for epidemics could be used to improve decision making for the coffee grower and thus reduce the losses due to the disease. Researchers tried to predict the occurrence of the disease earlier through statistical and machine learning models from crop properties, disease indicators and weather conditions. These studies considered the impact of weather variables in a common period for all. Assuming that the dynamics of weather that most impact the development of the disease occur in the same time periods is simplistic. We propose an approach to discover the time period (window) for each weather variables and crop related features that most explain a future observed CLR incidence, in order to obtain a prediction model through machine learning. The selection of the variables more related with coffee rust incidence and rejection of the features with no significant contribution of information in machine learning tasks were approached from Feature Selection methods (Filter, Wrapper, Embedded). In this way, a CLR incidence prediction model based on the features with the greatest impact on the development of the disease was obtained. Moreover, the use of SHapley Additive exPlanations allowed us to identify the impact of features in the model prediction. The monitoring of coffee rust incidence is the most important predictor, since it provides information about current inoculum and this determines how much can the incidence grow or decrease. Temperature is a determining driver for germination and penetration phases in days 9 to 6 and 4 to 1 before the date of prediction. Additionally, the amount of rain determines whether uredospore dispersal or washing conditions occurred. The mean absolute error expected in the model is 6.94% of incidence, trained with XGBoost algorithm and the dataset reduced by Embedded method. The estimation of the disease incidence 28 days later can be used to improve decision making in control and nutrition practices.

[1]  S. Savary,et al.  Effects of crop management patterns on coffee rust epidemics , 2004 .

[2]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[3]  Ana Vieira,et al.  The coffee leaf rust pathogen Hemileia vastatrix: one and a half centuries around the tropics. , 2017, Molecular plant pathology.

[4]  Jay Magidson,et al.  Correlated Component Regression: Re-thinking Regression in the Presence of Near Collinearity , 2013 .

[5]  Marco Cristancho,et al.  The coffee rust crises in Colombia and Central America (2008–2013): impacts, plausible causes and proposed solutions , 2015, Food Security.

[6]  Luiz Henrique Antunes Rodrigues,et al.  Análise da epidemia da ferrugem do cafeeiro com árvore de decisão , 2008 .

[7]  Shefali Sonavane,et al.  IoT based Smart Farming : Feature subset selection for optimized high-dimensional data using improved GA based approach for ELM , 2019, Comput. Electron. Agric..

[8]  Wes McKinney,et al.  Data Structures for Statistical Computing in Python , 2010, SciPy.

[9]  Jacques Avelino,et al.  The intensity of a coffee rust epidemic is dependent on production situations , 2006 .

[10]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[11]  Ferat Sahin,et al.  A survey on feature selection methods , 2014, Comput. Electr. Eng..

[12]  A. Kushalappa,et al.  An analysis of the development of coffee rust in the field. , 1980 .

[13]  Sebastian Raschka,et al.  MLxtend: Providing machine learning and data science utilities and extensions to Python's scientific computing stack , 2018, J. Open Source Softw..

[14]  A. Kushalappa Calculation of Apparent Infection Rate in Plant Diseases: Development of a Method to Correct for Host Growth , 1982 .

[15]  A. Eskes,et al.  Temperature requirements for germination, germ tube growth and appressorium formation of urediospores of Hemileia vastatrix , 1987, Netherlands Journal of Plant Pathology.

[16]  A. Kushalappa,et al.  Application of Survival Ratio for Monocyclic Process ofHemileia vastatrixin Predicting Coffee Rust Infection Rates , 1983 .

[17]  R. Clarke,et al.  Studies on the biology of Hemileia vastatrix Berk. & Br , 1963 .

[18]  J. Avelino,et al.  Shade is conducive to coffee rust as compared to full sun exposure under standardized fruit load conditions , 2012 .

[19]  Albertus Eskes,et al.  Advances in Coffee Rust Research , 1989 .

[20]  Christian Osendorfer,et al.  Sequential Feature Selection for Classification , 2011, Australasian Conference on Artificial Intelligence.

[21]  Juan Carlos Corrales,et al.  Feature selection for classification tasks: Expert knowledge or traditional methods? , 2018, J. Intell. Fuzzy Syst..

[22]  Samina Khalid,et al.  A survey of feature selection and feature extraction techniques in machine learning , 2014, 2014 Science and Information Conference.

[23]  Scott Lundberg,et al.  A Unified Approach to Interpreting Model Predictions , 2017, NIPS.

[24]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[25]  P. Tixier,et al.  Forecast models of coffee leaf rust symptoms and signs based on identified microclimatic combinations in coffee-based agroforestry systems in Costa Rica , 2020 .

[26]  Felipe Ferreira Bocca,et al.  The effect of tuning, feature engineering, and feature selection in data mining applied to rainfed sugarcane yield modelling , 2016, Comput. Electron. Agric..

[27]  R. Rayner Germination and penetration studies on coffee rust (Hemileia vastatrix B. & Br.). , 1961 .

[28]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[29]  F. Ferrandino Effect of crop growth and canopy filtration on the dynamics of plant disease epidemics spread by aerially dispersed spores. , 2008, Phytopathology.

[30]  S. Coakley,et al.  Predicting Stripe Rust Severity on Winter Wheat Using an Improved Method for Analyzing Meteorological and Rust Data , 1988 .

[31]  J. Waller Coffee rust—epidemiology and control , 1982 .

[32]  J. Anuradha,et al.  A Review of Feature Selection and Its Methods , 2019, Cybernetics and Information Technologies.

[33]  Timothy C. Krehbiel,et al.  Correlation Coefficient Rule of Thumb , 2004 .

[34]  Juan Carlos Corrales,et al.  Two-Level Classifier Ensembles for Coffee Rust Estimation in Colombian Crops , 2016, Int. J. Agric. Environ. Inf. Syst..

[35]  L. Zambolim,et al.  Current status and management of coffee leaf rust in Brazil , 2016, Tropical Plant Pathology.

[36]  Terry K Koo,et al.  A Guideline of Selecting and Reporting Intraclass Correlation Coefficients for Reliability Research. , 2016, Journal of chiropractic medicine.

[37]  C. Staver,et al.  Coffee agroecosystem performance under full sun, shade, conventional and organic management regimes in Central America , 2011, Agroforestry Systems.

[38]  Juan Carlos Corrales,et al.  Estimation of coffee rust infection and growth through two-level classifier ensembles based on expert knowledge , 2018 .

[39]  Juan Carlos Corrales,et al.  A Cloud-Based Platform for Decision Making Support in Colombian Agriculture: A Study Case in Coffee Rust , 2017 .

[40]  P. Tixier,et al.  Effects of microclimatic variables on the symptoms and signs onset of Moniliophthora roreri, causal agent of Moniliophthora pod rot in cacao , 2017, PloS one.

[41]  R. Rice,et al.  Shade Effects on the Dispersal of Airborne Hemileia vastatrix Uredospores. , 2016, Phytopathology.

[42]  J. Waller,et al.  Coffee Pests, Diseases and their Management , 2007 .

[43]  Zahid Iqbal,et al.  Detection and classification of citrus diseases in agriculture based on optimized weighted segmentation and feature selection , 2018, Comput. Electron. Agric..

[44]  Marco S. Reis,et al.  Wide spectrum feature selection (WiSe) for regression model building , 2019, Comput. Chem. Eng..

[45]  P. Tixier,et al.  Preharvest temperature affects chilling injury in dessert bananas during storage. , 2016, Journal of the science of food and agriculture.

[46]  Juan Carlos Corrales,et al.  Expert system for coffee rust detection based on supervised learning and graph pattern matching , 2017, Int. J. Metadata Semant. Ontologies.

[47]  J. Avelino,et al.  Economic constraints as drivers of coffee rust epidemics in Nicaragua , 2020 .