A new LSTM-based gene expression prediction model: L-GEPM

Molecular biology combined with in silico machine learning and deep learning has facilitated the broad application of gene expression profiles for gene function prediction, optimal crop breeding, disease-related gene discovery, and drug screening. Although the acquisition cost of genome-wide expression profiles has been steadily declining, the requirement generates a compendium of expression profiles using thousands of samples remains high. The Library of Integrated Network-Based Cellular Signatures (LINCS) program used approximately 1000 landmark genes to predict the expression of the remaining target genes by linear regression; however, this approach ignored the nonlinear features influencing gene expression relationships, limiting the accuracy of the experimental results. We herein propose a gene expression prediction model, L-GEPM, based on long short-term memory (LSTM) neural networks, which captures the nonlinear features affecting gene expression and uses learned features to predict the target genes. By comparing and analyzing experimental errors and fitting the effects of different prediction models, the LSTM neural network-based model, L-GEPM, can achieve low error and a superior fitting effect.

[1]  Jun S. Liu,et al.  The Genotype-Tissue Expression (GTEx) pilot analysis: Multitissue gene regulation in humans , 2015, Science.

[2]  Li Zhang,et al.  SD-MSAEs: Promoter recognition in human genome based on deep feature extraction , 2016, J. Biomed. Informatics.

[3]  Holger R. Maier,et al.  Framework for computationally efficient optimal crop and water allocation using ant colony optimization , 2016, Environ. Model. Softw..

[4]  Marc Garbey,et al.  A method for going from 2D laparoscope to 3D acquisition of surface landmarks by a novel computer vision approach , 2018, International Journal of Computer Assisted Radiology and Surgery.

[5]  Paul A Clemons,et al.  The Connectivity Map: Using Gene-Expression Signatures to Connect Small Molecules, Genes, and Disease , 2006, Science.

[6]  Yao Wu,et al.  Predict CT image from MRI data using KNN-regression with learned local descriptors , 2016, 2016 IEEE 13th International Symposium on Biomedical Imaging (ISBI).

[7]  Wei Li,et al.  RaptorX-Property: a web server for protein structure property prediction , 2016, Nucleic Acids Res..

[8]  Geoffrey E. Hinton,et al.  Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[9]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Binbin Pan,et al.  Efficient learning of supervised kernels with a graph-based loss function , 2016, Inf. Sci..

[11]  Xinghua Shi,et al.  A deep auto-encoder model for gene expression prediction , 2017, BMC Genomics.

[12]  Brendan J. Frey,et al.  Deep learning of the tissue-regulated splicing code , 2014, Bioinform..

[13]  Xiaohui Xie,et al.  DANN: a deep learning approach for annotating the pathogenicity of genetic variants , 2015, Bioinform..

[14]  Feng Liu,et al.  PEDLA: predicting enhancers with a deep learning-based algorithmic framework , 2016, Scientific Reports.

[15]  David P. Kreil,et al.  The concordance between RNA-seq and microarray data depends on chemical treatment and transcript abundance , 2014, Nature Biotechnology.

[16]  David R. Kelley,et al.  Basset: learning the regulatory code of the accessible genome with deep convolutional neural networks , 2015, bioRxiv.

[17]  Md. Kamrul Hasan,et al.  Linear regression-based feature selection for microarray data classification , 2015, Int. J. Data Min. Bioinform..

[18]  Alex E. Lash,et al.  Gene Expression Omnibus: NCBI gene expression and hybridization array data repository , 2002, Nucleic Acids Res..

[19]  Matthew N. McCall,et al.  Affymetrix GeneChip microarray preprocessing for multivariate analyses , 2012, Briefings Bioinform..

[20]  Ping Fu,et al.  A Hierarchical Multi-Label Classification Algorithm for Gene Function Prediction , 2017 .

[21]  Afrânio Lineu Kritski,et al.  Neural network models for supporting drug and multidrug resistant tuberculosis screening diagnosis , 2017, Neurocomputing.

[22]  Ruochi Zhang,et al.  Exploiting sequence-based features for predicting enhancer–promoter interactions , 2017, Bioinform..

[23]  Ahmed Guessoum,et al.  Complex diseases SNP selection and classification by hybrid Association Rule Mining and Artificial Neural Network - based Evolutionary Algorithms , 2016, Eng. Appl. Artif. Intell..

[24]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[25]  Ilja Kuzborskij,et al.  Scalable greedy algorithms for transfer learning , 2014, Comput. Vis. Image Underst..

[26]  Gamal Attiya,et al.  Classification of human cancer diseases by gene expression profiles , 2017, Appl. Soft Comput..

[27]  Pedro G. Ferreira,et al.  Transcriptome and genome sequencing uncovers functional variation in humans , 2013, Nature.

[28]  Yi Li,et al.  Gene expression inference with deep learning , 2015, bioRxiv.

[29]  Jee-Hyong Lee,et al.  Deep Neural Network Self-training Based on Unsupervised Learning and Dropout , 2017, Int. J. Fuzzy Log. Intell. Syst..

[30]  Farren J. Isaacs,et al.  Computational studies of gene regulatory networks: in numero molecular biology , 2001, Nature Reviews Genetics.

[31]  Jason Tsong-Li Wang,et al.  Inferring Gene Regulatory Networks by Combining Supervised and Unsupervised Methods , 2016, 2016 15th IEEE International Conference on Machine Learning and Applications (ICMLA).