A Guideline for Building Large Coffee Rust Samples Applying Machine Learning Methods

Coffee rust has become a serious concern for many coffee farmers and manufacturers. The American Phytopathological Society discusses its importance saying this: “the most economically important coffee disease in the world,” while “in monetary value, coffee is the most important agricultural product in international trade”. The early detection has inspired researchers to apply supervised learning algorithms on predicting the disease appearance. However, the main drawback of the related works is the few data samples of the dependent variable: Incidence Rate of Rust, since the datasets do not have a reliable representation of the disease, which will generate inaccurate classifiers. This paper provides a guide to increase coffee rust samples applying machine learning methods through a systematic review about coffee rust in order to select appropriate algorithms to increase rust samples.

[1]  Bin Li,et al.  A Novel Image Interpolation Technique Based on Fractal Theory , 2008, 2008 International Conference on Computer Science and Information Technology.

[2]  Oscar Luaces,et al.  Using nondeterministic learners to alert on coffee rust disease , 2011, Expert Syst. Appl..

[3]  Muhammad Faruq Mujaddid ANALISIS CHURN PREDICTION MENGGUNAKAN METODE LOGISTIC REGRESSION DAN SMOTE (Synthetic Minority Over-sampling Technique) PADA PERUSAHAAN TELEKOMUNIKASI , 2017 .

[4]  Juan Carlos Corrales,et al.  Early warning system for coffee rust disease based on error correcting output codes: a proposal , 2014 .

[5]  Hailin Li,et al.  Dynamic Time Warping Based on Cubic Spline Interpolation for Time Series Data Mining , 2014, 2014 IEEE International Conference on Data Mining Workshop.

[6]  José A. Malpica Splines Interpolation in High Resolution Satellite Imagery , 2005, ISVC.

[8]  Lazaros G. Papageorgiou,et al.  A regression tree approach using mathematical programming , 2017, Expert Syst. Appl..

[9]  Luiz Henrique Antunes Rodrigues,et al.  Análise da epidemia da ferrugem do cafeeiro com árvore de decisão , 2008 .

[10]  Jörg Drechsler,et al.  An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets , 2011, Comput. Stat. Data Anal..

[11]  Nittaya Kerdprasop,et al.  Predicting Rare Classes of Primary Tumors with Over-Sampling Techniques , 2011, FGIT-DTA/BSBT.

[12]  Huaxiang Zhang,et al.  RWO-Sampling: A random walk over-sampling approach to imbalanced data classification , 2014, Inf. Fusion.

[13]  Haibo He,et al.  Learning from Imbalanced Data , 2009, IEEE Transactions on Knowledge and Data Engineering.

[14]  Ann E. Nicholson,et al.  Prediction of coffee rust disease using Bayesian networks , 2012, PGM 2012.

[15]  Juan Carlos Corrales,et al.  A new dataset for coffee rust detection in Colombian crops base on classifiers , 2014 .

[16]  Francisco Herrera,et al.  A Review on Ensembles for the Class Imbalance Problem: Bagging-, Boosting-, and Hybrid-Based Approaches , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[17]  Khalide Jbilou,et al.  A global Lanczos method for image restoration , 2016, J. Comput. Appl. Math..

[18]  Nitesh V. Chawla,et al.  SMOTEBoost: Improving Prediction of the Minority Class in Boosting , 2003, PKDD.

[19]  Juan Carlos Corrales,et al.  An Empirical Multi-classifier for Coffee Rust Detection in Colombian Crops , 2015, ICCSA.

[20]  Herna L. Viktor,et al.  Learning from imbalanced data sets with boosting and data generation: the DataBoost-IM approach , 2004, SKDD.

[21]  Aimo A Törn Correlation coefficients of linear regression models of human decision making , 1980 .

[22]  Lorenzo Beretta,et al.  Nearest neighbor imputation algorithms: a critical evaluation , 2016, BMC Medical Informatics and Decision Making.

[23]  Francisco Herrera,et al.  Improving SMOTE with Fuzzy Rough Prototype Selection to Detect Noise in Imbalanced Classification Data , 2012, IBERAMIA.

[24]  Marcus A. Magnor,et al.  Synthetic Generation of High-Dimensional Datasets , 2011, IEEE Transactions on Visualization and Computer Graphics.

[25]  Yunqian Ma,et al.  Foundations of Imbalanced Learning , 2013 .

[26]  Praveen Kumar,et al.  Implementation of Cubic Spline Interpolation on Parallel Skeleton Using Pipeline Model on CPU-GPU Cluster , 2016, 2016 IEEE 6th International Conference on Advanced Computing (IACC).

[27]  Hui Han,et al.  Borderline-SMOTE: A New Over-Sampling Method in Imbalanced Data Sets Learning , 2005, ICIC.

[28]  R. Clarke,et al.  Studies on the biology of Hemileia vastatrix Berk. & Br , 1963 .

[29]  Heloisa A. Camargo,et al.  The use of fuzzy decision trees for coffee rust warning in Brazilian crops , 2011, 2011 11th International Conference on Intelligent Systems Design and Applications.

[30]  José Ramón Quevedo,et al.  Viability of an Alarm Predictor for Coffee Rust Disease Using Interval Regression , 2010, IEA/AIE.

[31]  Andy P. Field,et al.  Discovering Statistics Using SPSS , 2000 .

[32]  Parham Moradi,et al.  Diversity and separable metrics in over-sampling technique for imbalanced data classification , 2014, 2014 4th International Conference on Computer and Knowledge Engineering (ICCKE).

[33]  Saddys Segrera Francia,et al.  Multiclasificadores: métodos y arquitecturas , 2006 .

[34]  Rai-Fu Chen,et al.  Critical factors influencing physicians’ intention to use computerized clinical practice guidelines: an integrative model of activity theory and the technology acceptance model , 2015, BMC Medical Informatics and Decision Making.

[35]  Wan-Chi Siu,et al.  Learning-based image interpolation via robust k-NN searching for coherent AR parameters estimation , 2015, J. Vis. Commun. Image Represent..

[36]  Juan Carlos Corrales,et al.  Graph Patterns as Representation of Rules Extracted from Decision Trees for Coffee Rust Detection , 2015, MTSR.

[37]  Meira C.A.A.,et al.  Warning Models For Coffee Rust (hemileia Vastatrix Berkeley & Broome) By Data Mining Techniques [modelos De Predição Da Ferrugem Do Cafeeiro (hemileia Vastatrix Berkeley & Broome) Por Técnicas De Mineração De Dados] , 2014 .

[38]  Sattar Hashemi,et al.  To Combat Multi-Class Imbalanced Problems by Means of Over-Sampling Techniques , 2016, IEEE Transactions on Knowledge and Data Engineering.

[39]  Chidchanok Lursinsap,et al.  Handling imbalanced data sets with synthetic boundary data generation using bootstrap re-sampling and AdaBoost techniques , 2013, Pattern Recognit. Lett..

[40]  Linh Ngo,et al.  Synthetic data generation for the internet of things , 2014, 2014 IEEE International Conference on Big Data (Big Data).

[41]  Taghi M. Khoshgoftaar,et al.  RUSBoost: A Hybrid Approach to Alleviating Class Imbalance , 2010, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[42]  Herna L. Viktor,et al.  Multiple Classifier Prediction Improvements against Imbalanced Datasets through Added Synthetic Examples , 2004, SSPR/SPR.

[43]  Zhang Chunkai,et al.  A new sampling approach for classification of imbalanced data sets with high density , 2014, 2014 International Conference on Big Data and Smart Computing (BIGCOMP).

[44]  Yunqian Ma,et al.  Imbalanced Learning: Foundations, Algorithms, and Applications , 2013 .

[45]  LiRui,et al.  Image sharpening algorithm based on a variety of interpolation methods , 2012 .

[46]  Yaman Hamed,et al.  An application of K-Nearest Neighbor interpolation on calibrating corrosion measurements collected by two non-destructive techniques , 2015, 2015 IEEE 3rd International Conference on Smart Instrumentation, Measurement and Applications (ICSIMA).

[47]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[48]  Luiz Henrique Antunes Rodrigues,et al.  Modelos de alerta para o controle da ferrugem-do-cafeeiro em lavouras com alta carga pendente , 2009 .

[49]  Juan Carlos Corrales,et al.  Towards Detecting Crop Diseases and Pest by Supervised Learning , 2015 .

[50]  Marco Cristancho,et al.  The coffee rust crises in Colombia and Central America (2008–2013): impacts, plausible causes and proposed solutions , 2015, Food Security.

[51]  Ginny Y. Wong,et al.  A novel evolutionary preprocessing method based on over-sampling and under-sampling for imbalanced datasets , 2013, IECON 2013 - 39th Annual Conference of the IEEE Industrial Electronics Society.

[52]  Amit Ganatra,et al.  A Comparative Study of Training Algorithms for Supervised Machine Learning , 2012 .

[53]  Hongyu Guo,et al.  Boosting with data generation: improving the classification of hard to learn examples , 2004 .

[54]  Wenyuan Wang,et al.  An Over-sampling Expert System for Learing from Imbalanced Data Sets , 2005, 2005 International Conference on Neural Networks and Brain.

[55]  Alexander J. Smola,et al.  Support Vector Method for Function Approximation, Regression Estimation and Signal Processing , 1996, NIPS.

[56]  Eamonn J. Keogh,et al.  An online algorithm for segmenting time series , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[57]  Yang Wang,et al.  Boosting for Learning Multiple Classes with Imbalanced Class Distribution , 2006, Sixth International Conference on Data Mining (ICDM'06).

[58]  Juan Carlos Corrales,et al.  Two-Level Classifier Ensembles for Coffee Rust Estimation in Colombian Crops , 2016, Int. J. Agric. Environ. Inf. Syst..

[59]  Germán Gutiérrez,et al.  Lack of Data: Is It Enough Estimating the Coffee Rust with Meteorological Time Series? , 2017, ICCSA.

[60]  Nitesh V. Chawla,et al.  Data Mining for Imbalanced Datasets: An Overview , 2005, The Data Mining and Knowledge Discovery Handbook.

[61]  Songul Albayrak,et al.  Alleviating class imbalance problem in data mining , 2013, 2013 21st Signal Processing and Communications Applications Conference (SIU).