EMNets: A Convolutional Autoencoder for Protein Surface Retrieval Based on Cryo-Electron Microscopy Imaging

Protein surface shape plays an essential role in various function of proteins. In order to efficiently investigate protein function and evolutionary history, we introduce a global protein surface shape representation called EMNets. EMNets provides an effective and accurate way of protein surface representation and similarity search, and thus contributes to biomedical research. The method uses a Convolutional Autoencoder (CAE) neural network to learn the geometric information of three-dimensional (3D) density maps in a data-driven manner. Our method effectively represents a 3D cryo-electron microscopy density map by using a descriptor consists of only 256 numeric variables which is called EMNets descriptor. Based on EMNets descriptor, we are able to retrieve similar protein surfaces using k-nearest-neighbor algorithm in real-time. The search results of protein surface represented with the EMNets descriptor has shown high agreement with the existing Combinatorial Extension (CE) algorithm of sequence and structure similarity search. Overall, EMNets is a powerful tool in comparing 3D protein structures obtained by cryo-electron microscopy.

[1]  M. Barber,et al.  The analysis of small proteins in the molecular weight range 10-24 kDa by magnetic sector mass spectrometry. , 1987, Rapid communications in mass spectrometry : RCM.

[2]  Albert Ng,et al.  Beta-Barrel Detection for Medium Resolution Cryo-Electron Microscopy Density Maps Using Genetic Algorithms and Ray Tracing , 2018, J. Comput. Biol..

[3]  Jianxiong Xiao,et al.  3D ShapeNets: A deep representation for volumetric shapes , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  K. Henrick,et al.  New electron microscopy database and deposition system. , 2002, Trends in biochemical sciences.

[5]  Daisuke Kihara,et al.  Protein 3D Structure and Electron Microscopy Map Retrieval Using 3D‐SURFER2.0 and EM‐SURFER , 2017, Current protocols in bioinformatics.

[6]  Ming Ouhyoung,et al.  On Visual Similarity Based 3D Model Retrieval , 2003, Comput. Graph. Forum.

[7]  Dong Si,et al.  A Graph Based Method for the Prediction of Backbone Trace from Cryo-EM Density Maps , 2017, BCB.

[8]  Daisuke Kihara,et al.  Application of 3D Zernike descriptors to shape-based ligand similarity searching , 2009, J. Cheminformatics.

[9]  Thomas F. Koetzle The Protein Data Bank , 1981 .

[10]  R. Glen,et al.  Molecular similarity: a key technique in molecular informatics. , 2004, Organic & biomolecular chemistry.

[11]  Daisuke Kihara,et al.  Protein Surface Representation and Comparison: New Approaches in Structural Proteomics , 2011 .

[12]  Wen Jiang,et al.  EMAN2: an extensible image processing suite for electron microscopy. , 2007, Journal of structural biology.

[13]  Bin Li,et al.  Fast protein tertiary structure retrieval based on global surface shape similarity , 2008, Proteins.

[14]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[15]  P E Bourne,et al.  Protein structure alignment by incremental combinatorial extension (CE) of the optimal path. , 1998, Protein engineering.

[16]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[17]  James E. Bray,et al.  The CATH database: an extended protein family resource for structural and functional genomics , 2003, Nucleic Acids Res..

[18]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[19]  Luís Seabra Lopes,et al.  GOOD: A global orthographic object descriptor for 3D object recognition and manipulation , 2016, Pattern Recognit. Lett..

[20]  Shuiwang Ji,et al.  Residual Deconvolutional Networks for Brain Electron Microscopy Image Segmentation , 2017, IEEE Transactions on Medical Imaging.

[21]  Daisuke Kihara,et al.  Navigating 3D electron microscopy maps with EM-SURFER , 2015, BMC Bioinformatics.

[22]  Szymon Rusinkiewicz,et al.  Rotation Invariant Spherical Harmonic Representation of 3D Shape Descriptors , 2003, Symposium on Geometry Processing.

[23]  N. O. Manning,et al.  The protein data bank , 1999, Genetica.

[24]  Miin-Shen Yang,et al.  Similarity measures of intuitionistic fuzzy sets based on Hausdorff distance , 2004, Pattern Recognit. Lett..

[25]  Dong Si,et al.  Modeling Beta-Traces for Beta-Barrels from Cryo-EM Density Maps , 2017, BioMed research international.

[26]  Angshuman Bagchi,et al.  A Brief Overview of a Few Popular and Important Protein Databases , 2012 .

[27]  Marvin Johnson,et al.  Concepts and applications of molecular similarity , 1990 .