Using Deep Learning with Position Specific Scoring Matrices to Identify Efflux Proteins in Membrane and Transport Proteins

In several years, deep learning is a new area of machine learning field, which is the motivation of developing machine learning near to artificial intelligent. The neural networks belongs to deep learning are progressively important ideas in a variety of fields with great performance. Accordingly, utilization of deep learning in bioinformatics to enhance performance is very important. Convolutional neural networks is a network of deep learning which is claimed to be the best model to solve the problem of object recognition and detection utilizing GPU computing. In this study, we try to use CNN to identify efflux proteins in membrane and transport proteins, which is a famous problem in bioinformatics field. We construct the CNN from PSSM profiles with CUDA and Keras package based on Theano backend. Finally this approach achieved a significant improvement after we compare with the previous paper on efflux proteins. The proposed method can serve as an effective tool for identifying efflux proteins and can help biologists understand the functions of the efflux proteins. Moreover this study provides a basis for further research that can enrich a field of applying deep learning in bioinformatics.

[1]  Yu-Yen Ou,et al.  Protein disorder prediction by condensed PSSM considering propensity for order or disorder , 2006, BMC Bioinformatics.

[2]  De-Shuang Huang,et al.  Prediction of inter-residue contacts map based on genetic algorithm optimized radial basis function neural network and binary input encoding scheme , 2004, J. Comput. Aided Mol. Des..

[3]  D T Jones,et al.  Protein secondary structure prediction based on position-specific scoring matrices. , 1999, Journal of molecular biology.

[4]  Yu-Yen Ou,et al.  Prediction of membrane spanning segments and topology in β‐barrel membrane proteins at better accuracy , 2010, J. Comput. Chem..

[5]  Tzong-Yi Lee,et al.  Incorporating Distant Sequence Features and Radial Basis Function Networks to Identify Ubiquitin Conjugation Sites , 2011, PloS one.

[6]  Yu-Yen Ou,et al.  Incorporating significant amino acid pairs to identify O-linked glycosylation sites on transmembrane proteins and non-transmembrane proteins , 2010, BMC Bioinformatics.

[7]  Gaël Varoquaux,et al.  The NumPy Array: A Structure for Efficient Numerical Computation , 2011, Computing in Science & Engineering.

[8]  M. Gromiha,et al.  Classification of transporters using efficient radial basis function networks with position‐specific scoring matrices and biochemical properties , 2010, Proteins.

[9]  Yu-Yen Ou,et al.  TMBETADISC-RBF: Discrimination of beta-barrel membrane proteins using RBF networks and PSSM profiles , 2008, Comput. Biol. Chem..

[10]  Matthew D. Zeiler ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.

[11]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[12]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[13]  Zheng Rong Yang,et al.  Bio-basis function neural network for prediction of protease cleavage sites in proteins , 2005, IEEE Transactions on Neural Networks.

[14]  Ao Li,et al.  LOCSVMPSI: a web server for subcellular localization of eukaryotic proteins using SVM and profile of PSI-BLAST , 2005, Nucleic Acids Res..

[15]  Razvan Pascanu,et al.  Theano: A CPU and GPU Math Compiler in Python , 2010, SciPy.

[16]  Yu-Yen Ou,et al.  Identification of efflux proteins using efficient radial basis function networks with position‐specific scoring matrices and biochemical properties , 2013, Proteins.

[17]  Yu-Yen Ou,et al.  Prediction of transporter targets using efficient RBF networks with PSSM profiles and biochemical properties , 2011, Bioinform..