Crop Disease Protection Using Parallel Machine Learning Approaches

Crop diseases are the most important biological hazards to challenge sustainable development in agricultural production for many years. Every year, 42% of the global agricultural yield is destroyed by disease. Bioinformatics techniques provide efficient methods with which to analyze and interpret the raw biological data, which helps to study the effect of a pathogen on a crop. Microarray gene expression data represent the expression levels of the genes of a cell (organism) maintained in a particular environment. Hence, significant gene prediction and pathogen–host interactions can be studied using gene expression data. Different machine learning techniques can be applied to extract useful information represented by the candidate genes. The approach proposed in this chapter consists of the preprocessing of gene expression data, gene selection or feature extraction using a parallel approach and classification. The feature selection methods have been analyzed for the extraction of candidate genes with biological significance for rice-related diseases; these are a support vector machine with recursive feature elimination (SVM-RFE), minimum redundancy maximum relevance (mRMR), principal component analysis (PCA), successive feature selection (SFS) and independent component analysis (ICA). In order to deal with computational complexity and the large volume of data, the combination of general-purpose graphics processing unit (GPGPU) computing and MapReduce programming on an Apache Hadoop framework is proposed. The experimental results show improved time efficiency in feature extraction and classification.

[1]  R. Tibshirani,et al.  Repeated observation of breast tumor subtypes in independent gene expression data sets , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[2]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[3]  Satoru Miyano,et al.  A Top-r Feature Selection Algorithm for Microarray Gene Expression Data , 2012, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[4]  S. Dudoit,et al.  Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data , 2002 .

[5]  Kurt Keutzer,et al.  Fast support vector machine training and classification on graphics processors , 2008, ICML '08.

[6]  Yanqing Zhang,et al.  Development of Two-Stage SVM-RFE Gene Selection Strategy for Microarray Expression Data Analysis , 2007, TCBB.

[7]  Hui Wang,et al.  Parallel Implementation of Classification Algorithms Based on Cloud Computing Environment , 2012 .

[8]  Nilanjan Dey,et al.  Automated Classification of Mammographic Abnormalities Using Transductive Semi Supervised Learning Algorithm , 2016 .

[9]  Jennifer G. Dy,et al.  Harnessing the Power of GPUs to Speed Up Feature Selection for Outlier Detection , 2014, Journal of Computer Science and Technology.

[10]  A.K.C. Wong,et al.  Attribute clustering for grouping, selection, and classification of gene expression data , 2005, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[11]  Hong Yan,et al.  Biomarker Identification and Cancer Classification Based on Microarray Data Using Laplace Naive Bayes Model with Mean Shrinkage , 2012, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[12]  Li-Yeh Chuang,et al.  A Novel Feature Selection for Gene Expression Data , 2006, JCIS.

[13]  Xindong Wu,et al.  MReC4.5: C4.5 Ensemble Classification with MapReduce , 2009, 2009 Fourth ChinaGrid Annual Conference.

[14]  Yaping Lin,et al.  Gene expression data classification using SVM-KNN classifier , 2004, Proceedings of 2004 International Symposium on Intelligent Multimedia, Video and Speech Processing, 2004..

[15]  Jiawei Han,et al.  Cancer classification using gene expression data , 2003, Inf. Syst..

[16]  Lei Zhang,et al.  Tumor Classification Based on Non-Negative Matrix Factorization Using Gene Expression Data , 2011, IEEE Transactions on NanoBioscience.

[17]  Rajashree Dash,et al.  Feature selection in gene expression data using principal component analysis and rough set theory. , 2011, Advances in experimental medicine and biology.

[18]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[19]  Hairong Kuang,et al.  The Hadoop Distributed File System , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).

[20]  Xueqin Zhang,et al.  GPU Implementation of Parallel Support Vector Machine Algorithm with Applications to Detection Intruder , 2014, J. Comput..

[21]  Nilanjan Dey,et al.  Effect of fuzzy partitioning in Crohn’s disease classification: a neuro-fuzzy-based approach , 2016, Medical & Biological Engineering & Computing.

[22]  Francisco Tirado,et al.  Biclustering and classification analysis in gene expression using Nonnegative Matrix Factorization on multi-GPU systems , 2011, 2011 11th International Conference on Intelligent Systems Design and Applications.

[23]  Jack Y. Yang,et al.  A comparative study of different machine learning methods on microarray gene expression data , 2008, BMC Genomics.

[24]  Nilanjan Dey,et al.  PCA-PNN and PCA-SVM Based CAD Systems for Breast Density Classification , 2016, Applications of Intelligent Optimization in Biology and Medicine.

[25]  Yan Wang,et al.  Prediction of disease-resistant gene in rice based on SVM-RFE , 2010, 2010 3rd International Conference on Biomedical Engineering and Informatics.

[26]  Xin Zhou,et al.  MSVM-RFE: extensions of SVM-RFE for multiclass gene selection on DNA microarray data , 2007, Bioinform..

[27]  Kevin D. Seppi,et al.  Parallel PSO using MapReduce , 2007, 2007 IEEE Congress on Evolutionary Computation.

[28]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[29]  Nilanjan Dey,et al.  Automated stratification of liver disease in ultrasound: An online accurate feature classification paradigm , 2016, Comput. Methods Programs Biomed..

[30]  John N Weinstein,et al.  A stromal gene signature associated with inflammatory breast cancer , 2008, International journal of cancer.

[31]  Nello Cristianini,et al.  Support vector machine classification and validation of cancer tissue samples using microarray expression data , 2000, Bioinform..

[32]  Nilanjan Dey,et al.  Dengue Fever Classification Using Gene Expression Data: A PSO Based Artificial Neural Network Approach , 2016, FICTA.

[33]  Nilanjan Dey,et al.  Classification and Clustering in Biomedical Signal Processing , 2016 .

[34]  V. Saravanan,et al.  An SVM based Classification Method for Cancer Data using Minimum Microarray Gene Expressions , 2010 .

[35]  Ioannis Kompatsiaris,et al.  GPU acceleration for support vector machines , 2011, WIAMIS 2011.

[36]  W. Ramakrishna,et al.  Machine Learning Approaches Distinguish Multiple Stress Conditions using Stress-Responsive Genes and Identify Candidate Genes for Broad Resistance in Rice[C][W][OPEN] , 2013, Plant Physiology.

[37]  Yudong D. He,et al.  Gene expression profiling predicts clinical outcome of breast cancer , 2002, Nature.