Lung Cancer Prediction Using Neural Network Ensemble with Histogram of Oriented Gradient Genomic Features

This paper reports an experimental comparison of artificial neural network (ANN) and support vector machine (SVM) ensembles and their “nonensemble” variants for lung cancer prediction. These machine learning classifiers were trained to predict lung cancer using samples of patient nucleotides with mutations in the epidermal growth factor receptor, Kirsten rat sarcoma viral oncogene, and tumor suppressor p53 genomes collected as biomarkers from the IGDB.NSCLC corpus. The Voss DNA encoding was used to map the nucleotide sequences of mutated and normal genomes to obtain the equivalent numerical genomic sequences for training the selected classifiers. The histogram of oriented gradient (HOG) and local binary pattern (LBP) state-of-the-art feature extraction schemes were applied to extract representative genomic features from the encoded sequences of nucleotides. The ANN ensemble and HOG best fit the training dataset of this study with an accuracy of 95.90% and mean square error of 0.0159. The result of the ANN ensemble and HOG genomic features is promising for automated screening and early detection of lung cancer. This will hopefully assist pathologists in administering targeted molecular therapy and offering counsel to early stage lung cancer patients and persons in at risk populations.

[1]  E. Petricoin,et al.  SELDI-TOF-based serum proteomic pattern diagnostics for early detection of cancer. , 2004, Current opinion in biotechnology.

[2]  B T Abe,et al.  Experimental comparison of support vector machines with random forests for hyperspectral image land cover classification , 2014, Journal of Earth System Science.

[3]  Yanbo Huang,et al.  Advances in Artificial Neural Networks - Methodological Development and Application , 2009, Algorithms.

[4]  Guoqiang Peter Zhang,et al.  Neural networks for classification: a survey , 2000, IEEE Trans. Syst. Man Cybern. Part C.

[5]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[6]  Wilhelm Burger,et al.  Digital Image Processing - An Algorithmic Introduction using Java , 2008, Texts in Computer Science.

[7]  Paul Dan Cristea,et al.  Genetic signal representation and analysis , 2002, SPIE BiOS.

[8]  M. Z. Rehman,et al.  IMPROVING THE ACCURACY OF GRADIENT DESCENT BACK PROPAGATIONALGORITHM (GDAM) ON CLASSIFICATION PROBLEMS , 2011 .

[9]  Xinghao Jiang,et al.  Human Action Recognition Based on Oriented Gradient Histogram of Slide Blocks on Spatio-Temporal Silhouette , 2012 .

[10]  Journal Ijmer,et al.  Face expression recognition using Scaled-conjugate gradient Back-Propagation algorithm , 2014 .

[11]  L. Breiman Arcing classifier (with discussion and a rejoinder by the author) , 1998 .

[12]  P Baas,et al.  Metastatic non-small-cell lung cancer: consensus on pathology and molecular tests, first-line, second-line, and third-line therapy: 1st ESMO Consensus Conference in Lung Cancer; Lugano 2010. , 2011, Annals of oncology : official journal of the European Society for Medical Oncology.

[13]  Heng Zhao,et al.  Identification of candidate genes for lung cancer somatic mutation test kits , 2013, Genetics and molecular biology.

[14]  Tessamma Thomas,et al.  Discrete wavelet transform de-noising in eukaryotic gene splicing , 2010, BMC Bioinformatics.

[15]  D. Ettinger,et al.  Metastatic non-small cell lung cancer , 1993 .

[16]  Yang Shao,et al.  Comparison of support vector machine, neural network, and CART algorithms for the land-cover classification using limited training data points , 2012 .

[17]  Ocn,et al.  What you need to know about ... lung cancer. , 2015, Nursing Times.

[18]  Hon Keung Kwan,et al.  Spectral analysis of numerical exon and intron sequences , 2010, 2010 IEEE International Conference on Bioinformatics and Biomedicine Workshops (BIBMW).

[19]  M. Zajac-Kaye,et al.  Myc oncogene: a key component in cell cycle regulation and its implication for lung cancer. , 2001, Lung cancer.

[20]  Kasthurirangan Gopalakrishnan,et al.  Effect of training algorithms on neural networks aided pavement diagnosis , 2010 .

[21]  Marcel Dettling,et al.  BagBoosting for tumor classification with gene expression data , 2004, Bioinform..

[22]  Leonidas D. Iasemidis,et al.  Autoregressive Modeling and Feature Analysis of DNA Sequences , 2004, EURASIP J. Adv. Signal Process..

[23]  Takio Kurita,et al.  Selection of Histograms of Oriented Gradients Features for Pedestrian Detection , 2007, ICONIP.

[24]  Md. Rafiqul Islam,et al.  Face Recognition Using Local Binary Patterns (LBP) , 2013 .

[25]  Matti Pietikäinen,et al.  Face Description with Local Binary Patterns: Application to Face Recognition , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[27]  Martin Fodslette Meiller A Scaled Conjugate Gradient Algorithm for Fast Supervised Learning , 1993 .

[28]  Mahmood Akhtar,et al.  Signal Processing in Sequence Analysis: Advances in Eukaryotic Gene Prediction , 2008, IEEE Journal of Selected Topics in Signal Processing.

[29]  G Coppini,et al.  Detection of single and clustered microcalcifications in mammograms using fractals models and neural networks. , 2004, Medical engineering & physics.

[30]  Christopher I Amos,et al.  Aggregation of cancer among relatives of never‐smoking lung cancer patients , 2007, International journal of cancer.

[31]  R. Linsker,et al.  A measure of DNA periodicity. , 1986, Journal of theoretical biology.

[32]  Dipankar Das,et al.  ACTIVITY RECOGNITION USING HISTOGRAM OF ORIENTED GRADIENT PATTERN HISTORY , 2014 .

[33]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[34]  L. Breiman Arcing Classifiers , 1998 .

[35]  Matti Pietikäinen,et al.  A comparative study of texture measures with classification based on featured distributions , 1996, Pattern Recognit..

[36]  Wen-Hui Su,et al.  IGDB.NSCLC: integrated genomic database of non-small cell lung cancer , 2011, Nucleic Acids Res..

[37]  Didier Lardinois,et al.  Cytokine & chemokine response in the lungs, pleural fluid and serum in thoracic surgery using one-lung ventilation , 2011, Journal of Inflammation.

[38]  Xin Yao,et al.  Evolutionary ensembles with negative correlation learning , 2000, IEEE Trans. Evol. Comput..

[39]  E. Ambikairajah,et al.  On DNA Numerical Representations for Period-3 Based Exon Prediction , 2007, 2007 IEEE International Workshop on Genomic Signal Processing and Statistics.

[40]  L. Tanoue,et al.  Molecular Origins of Cancer. Lung Cancer , 2010 .

[41]  D. Gandara,et al.  Origin and prognostic value of circulating KRAS mutations in lung cancer patients. , 2007, Cancer letters.

[42]  C. Langer,et al.  Emerging role of epidermal growth factor receptor inhibition in therapy for advanced malignancy: focus on NSCLC. , 2004, International journal of radiation oncology, biology, physics.

[43]  Hui Chen,et al.  Neural network‐based computer‐aided diagnosis in distinguishing malignant from benign solitary pulmonary nodules by computed tomography , 2007, Chinese medical journal.

[44]  P. Krawczyk,et al.  Screening of Gene Mutations in Lung Cancer for Qualification to Molecularly Targeted Therapies , 2012 .

[45]  C. Mountain,et al.  Revisions in the International System for Staging Lung Cancer. , 1997, Chest.

[46]  Reghunadhan Rajesh,et al.  Spectral histogram of oriented gradients (SHOGs) for Tamil language male/female speaker classification , 2012, Int. J. Speech Technol..

[47]  Emmanuel Adetiba,et al.  Ensembling of EGFR Mutations’ based ArtificialNeural Networks for Improved Diagnosis of Non-SmallCell Lung Cancer , 2011 .

[48]  T. Dallman,et al.  Performance comparison of benchtop high-throughput sequencing platforms , 2012, Nature Biotechnology.

[49]  Robert Brian Jenkins,et al.  Molecular Testing Guideline for Selection of Lung Cancer Patients for EGFR and ALK Tyrosine Kinase Inhibitors: Guideline from the College of American Pathologists, International Association for the Study of Lung Cancer, and Association for Molecular Pathology , 2013, Journal of thoracic oncology : official publication of the International Association for the Study of Lung Cancer.

[50]  Mohammed Abo-Zahhad,et al.  Genomic Analysis and Classification of Exon and Intron Sequences Using DNA Numerical Mapping Techniques , 2012 .

[51]  Yu-Bin Yang,et al.  Lung cancer cell identification based on artificial neural network ensembles , 2002, Artif. Intell. Medicine.

[52]  Michael Thomas,et al.  Consensus for EGFR Mutation Testing in Non-small Cell Lung Cancer: Results from a European Workshop , 2010, Journal of thoracic oncology : official publication of the International Association for the Study of Lung Cancer.

[53]  Bernard Zenko,et al.  Is Combining Classifiers with Stacking Better than Selecting the Best One? , 2004, Machine Learning.

[54]  Michael J. Pazzani,et al.  Error reduction through learning multiple descriptions , 2004, Machine Learning.

[55]  Laura Fernández-Robles,et al.  Adaptive local binary pattern with oriented standard deviation (ALBPS) for texture classification , 2013, EURASIP J. Image Video Process..

[56]  Faouzi Ghorbel,et al.  Stability evaluation of Neural and statistical Classifiers based on Modified Semi-bounded Plug-in Algorithm , .

[57]  Dejan Gjorgjevikj,et al.  A Multi-class SVM Classifier Utilizing Binary Decision Tree , 2009, Informatica.

[58]  Emmanuel Adetiba,et al.  Estimating an optimal backpropagation algorithm for training an ANN with the EGFR exon 19 nucleotide sequence : an electronic diagnostic basis for non-small cell lung cancer (NSCLC) , 2011 .

[59]  L. Sequist,et al.  Molecular Analysis-Based Treatment Strategies for the Management of Non-small Cell Lung Cancer , 2009, Journal of Thoracic Oncology.

[60]  Jonathan M. Mudge,et al.  The consensus coding sequence (CCDS) project: Identifying a common protein-coding gene set for the human and mouse genomes. , 2009, Genome research.

[61]  Kaushik Roy,et al.  Fly Wing Biometrics Using Modified Local Binary Pattern, SVMs and Random Forest , 2014 .

[62]  Martin Fodslette Møller,et al.  A scaled conjugate gradient algorithm for fast supervised learning , 1993, Neural Networks.

[63]  Nikos E. Mastorakis,et al.  Multilayer perceptron and neural networks , 2009 .

[64]  Tzu-An Chiang,et al.  FEED-FORWARD NEURAL NETWORKS TRAINING: A COMPARISON BETWEEN GENETIC ALGORITHM AND BACK-PROPAGATION LEARNING ALGORITHM , 2011 .

[65]  R. Voss,et al.  Evolution of long-range fractal correlations and 1/f noise in DNA base sequences. , 1992, Physical review letters.

[66]  George Cybenko,et al.  Approximation by superpositions of a sigmoidal function , 1989, Math. Control. Signals Syst..

[67]  Brian H. Dunford-Shore,et al.  Somatic mutations affect key pathways in lung adenocarcinoma , 2008, Nature.

[68]  R. Shackelford,et al.  ALK-rearrangements and testing methods in non-small cell lung cancer: a review , 2014, Genes & cancer.

[69]  Michael T. Manry,et al.  Recent Developments in Multilayer Perceptron Neural Networks , 2005 .