Biomarker identification using next generation sequencing data of RNA

Over the years, numerous studies have been performed in order to identify messenger RNAs (mRNAs) that are differentially expressed at different biological conditions for various diseases including cancer. In this regard, getting complete and noiseless data were always very challenging in previous technological set-up. While the inception of Next-Generation Sequencing (NGS) technology revolutionized the genome research, especially in the field of mRNA expression profile analysis. Here such data of breast cancer is used from The Cancer Genome Atlas (TCGA) to identify the cancer biomarkers. For this purpose, data have been preprocessed using statistical test and fold change concepts so that significant number of differentially expressed up and down regulated mRNAs can be recognized. Thereafter, wrapper based feature selection approach using Particle Swarm Optimization (PSO) and Support Vector Machine (SVM) has been applied on such preprocessed dataset to identify the potential mRNAs as biomarkers. Identified top 10 biomarkers are COMP, LRRC15, CTHRC1, CILP2, FOXF1, FIGF, PRDM16, LMX1B, IRX5 and LEPREL1. The quantitative results of the proposed method have been demonstrated in comparison with other state-of-the-art methods. Finally, enrichment analysis and the KEGG pathway analysis have also been conducted for the selected mRNAs.

[1]  Riccardo Poli,et al.  Particle swarm optimization , 1995, Swarm Intelligence.

[2]  Ujjwal Maulik,et al.  MaER: A New Ensemble Based Multiclass Classifier for Binding Activity Prediction of HLA Class II Proteins , 2015, PReMI.

[3]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[4]  M. Gerstein,et al.  The Transcriptional Landscape of the Yeast Genome Defined by RNA Sequencing , 2008, Science.

[5]  J. Maguire,et al.  Integrative analysis of the melanoma transcriptome. , 2010, Genome research.

[6]  Indrajit Saha,et al.  A new evolutionary gene selection technique , 2015, 2015 IEEE Congress on Evolutionary Computation (CEC).

[7]  Chi-Meng Tzeng,et al.  Integrated miRNA and mRNA expression profiling to identify mRNA targets of dysregulated miRNAs in non-obstructive azoospermia , 2015, Scientific Reports.

[8]  Cole Trapnell,et al.  Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. , 2010, Nature biotechnology.

[9]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Charlotte Soneson,et al.  A comparison of methods for differential expression analysis of RNA-seq data , 2013, BMC Bioinformatics.

[11]  Wolfgang Huber,et al.  Shrinkage estimation of dispersion in Negative Binomial models for RNA-seq experiments with small sample size , 2013, Bioinform..

[12]  Ayman Grada,et al.  Next-generation sequencing: methodology and application. , 2013, The Journal of investigative dermatology.

[13]  M. Gerstein,et al.  RNA-Seq: a revolutionary tool for transcriptomics , 2009, Nature Reviews Genetics.

[14]  W. Huber,et al.  which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. MAnorm: a robust model for quantitative comparison of ChIP-Seq data sets , 2011 .

[15]  F. Fleuret Fast Binary Feature Selection with Conditional Mutual Information , 2004, J. Mach. Learn. Res..

[16]  I. Kohane,et al.  Next-generation sequencing in the clinic: are we ready? , 2012, Nature Reviews Genetics.

[17]  Philip H. Ramsey Nonparametric Statistical Methods , 1974, Technometrics.

[18]  J. Moody,et al.  Feature Selection Based on Joint Mutual Information , 1999 .

[19]  Andries Petrus Engelbrecht,et al.  Particle swarm optimization: Velocity initialization , 2012, 2012 IEEE Congress on Evolutionary Computation.

[20]  Yue Shi,et al.  A modified particle swarm optimizer , 1998, 1998 IEEE International Conference on Evolutionary Computation Proceedings. IEEE World Congress on Computational Intelligence (Cat. No.98TH8360).

[21]  Ujjwal Maulik,et al.  Binding Activity Prediction of Cyclin-Dependent Inhibitors , 2015, J. Chem. Inf. Model..

[22]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[23]  Ujjwal Maulik,et al.  Application of RotaSVM for HLA Class II Protein-Peptide Interaction Prediction , 2014, BIOINFORMATICS.

[24]  A. Conesa,et al.  Differential expression in RNA-seq: a matter of depth. , 2011, Genome research.

[25]  Roberto Battiti,et al.  Using mutual information for selecting features in supervised neural net learning , 1994, IEEE Trans. Neural Networks.

[26]  Robert Tibshirani,et al.  Finding consistent patterns: A nonparametric approach for identifying differential expression in RNA-Seq data , 2013, Statistical methods in medical research.

[27]  Yuhui Shi,et al.  Particle swarm optimization: developments, applications and resources , 2001, Proceedings of the 2001 Congress on Evolutionary Computation (IEEE Cat. No.01TH8546).

[28]  Ujjwal Maulik,et al.  Ensemble learning prediction of protein-protein interactions using proteins functional annotations. , 2014, Molecular bioSystems.

[29]  C. Mason,et al.  Comprehensive evaluation of differential gene expression analysis methods for RNA-seq data , 2013, Genome Biology.

[30]  Mark D. Robinson,et al.  edgeR: a Bioconductor package for differential expression analysis of digital gene expression data , 2009, Bioinform..

[31]  Mark D. Robinson,et al.  Moderated statistical tests for assessing differences in tag abundance , 2007, Bioinform..

[32]  Ujjwal Maulik,et al.  Identification of miRNA signature using Next-Generation Sequencing data of prostate cancer , 2016, 2016 3rd International Conference on Recent Advances in Information Technology (RAIT).

[33]  Christian Callegari,et al.  Advances in Computing, Communications and Informatics (ICACCI) , 2015 .

[34]  Fatih Ozsolak,et al.  RNA sequencing: advances, challenges and opportunities , 2011, Nature Reviews Genetics.