Somatic mutation detection using ensemble of flexible neural tree model

The advances on next-generation sequencing technology (NGS) have enabled researchers to detect somatic mutations. Much effect has been devoted to improve accuracy of discovering somatic mutations from tumour/normal NGS data. In this study, flexible neural tree model (FNT) is proposed to detect somatic mutations in tumour-normal paired sequencing data. To improve the classification accuracy further, a new classification ensemble approach based on Radial Basis Function (RBF) neural networks as nonlinear combination function is proposed. The proposed method is applied to real biological dataset from exome capture data and the whole genome shotgun data. Results show that the obtained FNT model has a fewer number of variables with reduced number of input features and with significant improvement in the detection accuracy using the proposed ensemble learning method. Our method also selects 10 import features for somatic mutation detection, which could be used to analyze NGS mutations further.

[1]  Bernhard Y. Renard,et al.  Confidence-based Somatic Mutation Evaluation and Prioritization , 2012, PLoS Comput. Biol..

[2]  Joshua S. Paul,et al.  Genotype and SNP calling from next-generation sequencing data , 2011, Nature Reviews Genetics.

[3]  Paulo S. A. Freitas,et al.  Model combination in neural-based forecasting , 2006, Eur. J. Oper. Res..

[4]  Kumar Chellapilla,et al.  Evolving computer programs without subtree crossover , 1997, IEEE Trans. Evol. Comput..

[5]  Çagdas Hakan Aladag,et al.  Forecast Combination by Using Artificial Neural Networks , 2010, Neural Processing Letters.

[6]  Bo Yang,et al.  Feature selection and classification using flexible neural tree , 2006, Neurocomputing.

[7]  Asif Ekbal,et al.  Combining multiple classifiers using vote based classifier ensemble technique for named entity recognition , 2013, Data Knowl. Eng..

[8]  Rama Chellappa,et al.  Human and machine recognition of faces: a survey , 1995, Proc. IEEE.

[9]  Huanming Yang,et al.  SNP detection for massively parallel whole-genome resequencing. , 2009, Genome research.

[10]  Yuehui Chen,et al.  Small-time scale network traffic prediction based on flexible neural tree , 2012, Appl. Soft Comput..

[11]  Rajini Aruchamy,et al.  A Comparative Performance Study on Hybrid Swarm Model for Micro array Data , 2011 .

[12]  M. Stratton,et al.  The cancer genome , 2009, Nature.

[13]  Bo Yang,et al.  Flexible neural trees ensemble for stock index modeling , 2007, Neurocomputing.

[14]  Gholamreza Haffari,et al.  Feature-based classifiers for somatic mutation detection in tumour–normal paired sequencing data , 2011, Bioinform..

[15]  M. Nowak,et al.  Problems of somatic mutation and cancer. , 2004, BioEssays : news and reviews in molecular, cellular and developmental biology.

[16]  Tom Royce,et al.  A comprehensive catalogue of somatic mutations from a human cancer genome , 2010, Nature.

[17]  Bin Tean Teh,et al.  Assessing Matched Normal and Tumor Pairs in Next-Generation Sequencing Studies , 2011, PloS one.

[18]  Jiwen Dong,et al.  Nonlinear System Modelling Via Optimal Design Of Neural Trees , 2004, Int. J. Neural Syst..

[19]  Richard W Tothill,et al.  Next-generation sequencing for cancer diagnostics: a practical perspective. , 2011, The Clinical biochemist. Reviews.

[20]  M. DePristo,et al.  A framework for variation discovery and genotyping using next-generation DNA sequencing data , 2011, Nature Genetics.

[21]  Shuhei Kimura,et al.  Inference of S-system models of genetic networks using a cooperative coevolutionary algorithm , 2005, Bioinform..

[22]  Jiwen Dong,et al.  Time-series forecasting using flexible neural tree model , 2005, Inf. Sci..

[23]  Thomas A. Peterson,et al.  Domain landscapes of somatic mutations in cancer , 2012, BMC Genomics.

[24]  U. Banerji,et al.  Multi-Purpose Utility of Circulating Plasma DNA Testing in Patients with Advanced Cancers , 2012, PloS one.

[25]  Ajith Abraham,et al.  Flexible Neural Trees for Online Hand Gesture Recognition using surface Electromyography , 2012, J. Comput..

[26]  Haibo He,et al.  Hybrid learning based on Multiple Self-Organizing Maps and Genetic Algorithm , 2011, The 2011 International Joint Conference on Neural Networks.

[27]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[28]  Yuehui Chen,et al.  Face Recognition Using DCT and Hybrid Flexible Neural Tree , 2005, 2005 International Conference on Neural Networks and Brain.

[29]  Jamie K Teer,et al.  Systematic comparison of three genomic enrichment methods for massively parallel DNA sequencing. , 2010, Genome research.

[30]  E. Mardis,et al.  Analysis of next-generation genomic data in cancer: accomplishments and challenges. , 2010, Human molecular genetics.

[31]  Haibo He,et al.  Imbalanced evolving self-organizing learning , 2014, Neurocomputing.

[32]  E. Birney,et al.  Patterns of somatic mutation in human cancer genomes , 2007, Nature.

[33]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[34]  Irmtraud M. Meyer,et al.  The clonal and mutational evolution spectrum of primary triple-negative breast cancers , 2012, Nature.

[35]  Ajith Abraham,et al.  Modeling chaotic behavior of stock indices using intelligent paradigms , 2003, Neural Parallel Sci. Comput..

[36]  Thomas Zeng,et al.  Use of mutation profiles to refine the classification of endometrial carcinomas , 2012, The Journal of pathology.

[37]  Fei Li,et al.  Consensus Rules in Variant Detection from Next-Generation Sequencing Data , 2012, PloS one.