Positive-unlabeled convolutional neural networks for particle picking in cryo-electron micrographs

Cryo-electron microscopy is a popular method for the determination of protein structures; however, identifying a sufficient number of particles for analysis can take months of manual effort. Current computational approaches find many false positives and require ad hoc postprocessing, especially for unusually shaped particles. To address these shortcomings, we develop Topaz, an efficient and accurate particle-picking pipeline using neural networks trained with a general-purpose positive-unlabeled learning method. This framework enables particle detection models to be trained with few sparsely labeled particles and no labeled negatives. Topaz retrieves many more real particles than conventional picking methods while maintaining low false-positive rates, is capable of picking challenging unusually shaped proteins (for example, small, non-globular and asymmetric particles), produces more representative particle sets and does not require post hoc curation. We demonstrate the performance of Topaz on two difficult datasets and three conventional datasets. Topaz is modular, standalone, free and open source (http://topaz.csail.mit.edu).The challenge of accurate particle picking in cryo-EM analysis is addressed with Topaz, a neural-network-based algorithm that shows advantages over other tools, especially in picking unusually shaped particles.

[1]  Brendan Borrell,et al.  Rift widens over structure of HIV’s molecular anchor , 2013, Nature.

[2]  Ardan Patwardhan,et al.  EMPIAR: a public archive for raw electron microscopy image data , 2016, Nature Methods.

[3]  D. Agard,et al.  MotionCor2: anisotropic correction of beam-induced motion for improved cryo-electron microscopy , 2017, Nature Methods.

[4]  Michael S. Spilman,et al.  ResLog plots as an empirical metric of the quality of cryo-EM reconstructions. , 2014, Journal of structural biology.

[5]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[6]  M Radermacher,et al.  DoG Picker and TiltPicker: software tools to facilitate particle selection in single particle electron microscopy. , 2009, Journal of structural biology.

[7]  See-Kiong Ng,et al.  Positive Unlabeled Leaning for Time Series Classification , 2011, IJCAI.

[8]  Gang Niu,et al.  Positive-Unlabeled Learning with Non-Negative Risk Estimator , 2017, NIPS.

[9]  Abhishek Dutta,et al.  The VIA Annotation Software for Images, Audio and Video , 2019, ACM Multimedia.

[10]  Sjors H.W. Scheres,et al.  RELION: Implementation of a Bayesian approach to cryo-EM structure determination , 2012, Journal of structural biology.

[11]  Alexis Rohou,et al.  Structural Basis of Nav1.7 Inhibition by a Gating-Modifier Spider Toxin , 2019, Cell.

[12]  S. Scheres,et al.  Cryo-EM structure of the Plasmodium falciparum 80S ribosome bound to the anti-protozoan drug emetine , 2014, eLife.

[13]  Joseph H. Davis,et al.  Addressing preferred specimen orientation in single-particle cryo-EM through tilting , 2017, Nature Methods.

[14]  Yanan Zhu,et al.  A deep convolutional neural network approach to single-particle recognition in cryo-electron microscopy , 2016, BMC Bioinformatics.

[15]  A. Roseman Particle finding in electron micrographs using a fast local correlation algorithm. , 2003, Ultramicroscopy.

[16]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Charles Elkan,et al.  Learning classifiers from only positive and unlabeled data , 2008, KDD.

[18]  Hui Xu,et al.  Structural Basis of Nav1.7 Inhibition by a Gating-Modifier Spider Toxin. , 2019 .

[19]  N. Grigorieff,et al.  CTFFIND4: Fast and accurate defocus estimation from electron micrographs , 2015, bioRxiv.

[20]  Richard Henderson,et al.  Avoiding the pitfalls of single particle cryo-electron microscopy: Einstein from noise , 2013, Proceedings of the National Academy of Sciences.

[21]  Gideon S. Mann,et al.  Generalized Expectation Criteria for Semi-Supervised Learning with Weakly Labeled Data , 2010, J. Mach. Learn. Res..

[22]  Piotr Sliz,et al.  Collaboration gets the most out of software , 2013, eLife.

[23]  Christopher Irving,et al.  Appion: an integrated, database-driven pipeline to facilitate EM image processing. , 2009, Journal of structural biology.

[24]  R. Henderson,et al.  Optimal determination of particle orientation, absolute hand, and contrast loss in single-particle electron cryomicroscopy. , 2003, Journal of molecular biology.

[25]  Gilles Blanchard,et al.  Semi-Supervised Novelty Detection , 2010, J. Mach. Learn. Res..

[26]  Bonnie Berger,et al.  Positive-unlabeled convolutional neural networks for particle picking in cryo-electron micrographs , 2019, Nature Methods.

[27]  Sjors H.W. Scheres,et al.  Semi-automated selection of cryo-EM particles in RELION-1.3 , 2015, Journal of structural biology.

[28]  Tian Xia,et al.  DeepPicker: a Deep Learning Approach for Fully Automated Particle Picking in Cryo-EM , 2016, Journal of structural biology.

[29]  Dimitry Tegunov,et al.  Real-time cryo–EM data pre-processing with Warp , 2018, Nature Methods.

[30]  Conrad C. Huang,et al.  UCSF Chimera—A visualization system for exploratory research and analysis , 2004, J. Comput. Chem..

[31]  Wei Dai,et al.  Convolutional Neural Networks for Automated Annotation of Cellular Cryo-Electron Tomograms , 2017, Nature Methods.

[32]  David J. Fleet,et al.  cryoSPARC: algorithms for rapid unsupervised cryo-EM structure determination , 2017, Nature Methods.

[33]  C O S Sorzano,et al.  Scipion: A software framework toward integration, reproducibility and validation in 3D electron microscopy. , 2016, Journal of structural biology.

[34]  Junsong Yuan,et al.  Positive and Unlabeled Learning for Anomaly Detection with Multi-features , 2017, ACM Multimedia.

[35]  Wen Jiang,et al.  EMAN2: an extensible image processing suite for electron microscopy. , 2007, Journal of structural biology.

[36]  A. Cheng,et al.  2.8 Å resolution reconstruction of the Thermoplasma acidophilum 20S proteasome using cryo-electron microscopy , 2015, eLife.

[37]  P. Penczek,et al.  A Primer to Single-Particle Cryo-Electron Microscopy , 2015, Cell.

[38]  Xiaoli Li,et al.  Learning from Positive and Unlabeled Examples with Different Data Distributions , 2005, ECML.

[39]  Marin van Heel,et al.  Finding trimeric HIV-1 envelope glycoproteins in random noise , 2013 .

[40]  Youdong Mao,et al.  Molecular architecture of the uncleaved HIV-1 envelope glycoprotein trimer , 2013, Proceedings of the National Academy of Sciences.

[41]  Henning Stahlberg,et al.  Focus: The interface between data collection and data processing in cryo-EM. , 2017, Journal of structural biology.

[42]  Guangwen Yang,et al.  A fast method for particle picking in cryo-electron micrographs based on fast R-CNN , 2017 .

[43]  J Pulokas,et al.  Leginon: an automated system for acquisition of images from vitreous ice specimens. , 2000, Journal of structural biology.

[44]  Gang Niu,et al.  Convex Formulation for Learning from Positive and Unlabeled Data , 2015, ICML.

[45]  B. Berger,et al.  Visualization of clustered protocadherin neuronal self-recognition complexes , 2019 .

[46]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[47]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[48]  Sriram Subramaniam,et al.  Structure of trimeric HIV-1 envelope glycoproteins , 2013, Proceedings of the National Academy of Sciences.

[49]  A M Roseman,et al.  FindEM--a fast, efficient program for automatic selection of particles from electron micrographs. , 2004, Journal of structural biology.

[50]  Soumith Chintala,et al.  Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.