Prediction of Plant Lipocalin Genes based on Convolutional Neural Networks

Lipocalins play a key role in regulating biological functions such as modulation of cell growth and metabolism, binding of cell-surface receptors, nerve growth and regeneration, and regulating of immune responses. Identifying and analyzing plant lipocalins has become one of the important issues in the study of lipocalin family. Traditional methods such as protein structure analysis, cell localization and phylogenetic studies are complex and very expensive, which makes current exploration progress of plant lipocalins still slow compared with deep learning methods. In this paper, based on convolutional neural network, we constructed a deep learning model called 'LCNet', which has sensitivity and specificity for plant lipocalin genes of 0.953 and 0.941 respectively. In addition, we further verified the prediction performance of LCNet model by studying the similarities and differences of gene relative expression levels between lipocalin genes already identified biologically in Oryza and the genes predicted as Oryza lipocalin by LCNet model during the process of absorbing and transporting PCB18. This combination of deep learning and biological experiments has high precision, simple operation and low cost, which can reduce the workload of biologists and can be extended to other proteins to solve similar problems.

[1]  A. D. Hieber,et al.  Xanthophyll Cycle Enzymes Are Members of the Lipocalin Family, the First Identified from Plants* , 1998, The Journal of Biological Chemistry.

[2]  D. Sanchez,et al.  A phylogenetic analysis of the lipocalin protein family. , 2000, Molecular biology and evolution.

[3]  D R Flower,et al.  The lipocalin protein family: structural and sequence overview. , 2000, Biochimica et biophysica acta.

[4]  D R Flower,et al.  Lipocalins: unity in diversity. , 2000, Biochimica et biophysica acta.

[5]  J. Salier,et al.  Chromosomal location, exon/intron organization and evolution of lipocalin genes. , 2000, Biochimica et biophysica acta.

[6]  Ghislain Breton,et al.  Molecular and structural analyses of a novel temperature stress‐induced lipocalin from wheat and Arabidopsis , 2002, FEBS letters.

[7]  Gabriel Gutierrez,et al.  Molecular evolution of epididymal lipocalin genes localized on mouse chromosome 2. , 2004, Gene.

[8]  Jean Danyluk,et al.  Identification, Expression, and Evolutionary Analyses of Plant Lipocalins1[W] , 2005, Plant Physiology.

[9]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[10]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[11]  T. Tatusova,et al.  Gnomon – NCBI eukaryotic gene prediction tool , 2010 .

[12]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[13]  Dennis A. Benson,et al.  GenBank , 2012, Nucleic acids research.

[14]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[15]  Pritish Kumar Varadwaj,et al.  DeepLNC, a long non-coding RNA prediction tool using deep neural network , 2016, Network Modeling Analysis in Health Informatics and Bioinformatics.

[16]  Xiaohui S. Xie,et al.  DanQ: a hybrid convolutional and recurrent deep neural network for quantifying the function of DNA sequences , 2015, bioRxiv.

[17]  Kenji Satou,et al.  DNA Sequence Classification by Convolutional Neural Network , 2016 .

[18]  V. Solovyev,et al.  Recognition of prokaryotic and eukaryotic promoters using convolutional deep learning neural networks , 2016, PloS one.

[19]  Maria Chikina,et al.  Modeling Enhancer-Promoter Interactions with Attention-Based Neural Networks , 2017, bioRxiv.

[20]  Ross B. Girshick,et al.  Focal Loss for Dense Object Detection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  J. Ostell,et al.  GenBank , 2007, Nucleic Acids Res..