Positive-unlabeled convolutional neural networks for particle picking in cryo-electron micrographs.

Cryo-electron microscopy (cryoEM) is fast becoming the preferred method for protein structure determination. Particle picking is a significant bottleneck in the solving of protein structures from single particle cryoEM. Hand labeling sufficient numbers of particles can take months of effort and current computationally based approaches are often ineffective. Here, we frame particle picking as a positive-unlabeled classification problem in which we seek to learn a convolutional neural network (CNN) to classify micrograph regions as particle or background from a small number of labeled positive examples and many unlabeled examples. However, model fitting with very few labeled data points is a challenging machine learning problem. To address this, we develop a novel objective function, GE-binomial, for learning model parameters in this context. This objective uses a newly-formulated generalized expectation (GE) criteria to learn effectively from unlabeled data when using minibatched stochastic gradient descent optimizers. On a high-quality publicly available cyroEM data set, we show that CNNs trained with this objective classify particles accurately with very few positive training examples. Using 1000 randomly sampled particles (out of 100k total) as references, EMAN2’s byRef method achieves 33% precision at 90% recall. With the same 1000 labeled training particles, we improve this result by roughly 40% to 46% precision at 90% recall. Remarkably, we achieve 41% precision with 1/10th the number of labeled particles and still reach 34% precision with only 1/100th the number of labeled particles at 90% recall. At all numbers of labeled particles, we improve substantially over EMAN2’s area under the precision-recall curve (AUPR). Our relative performance increase is even greater on a difficult unpublished dataset supplied by the Shapiro lab. Furthermore, we show that incorporating an autoencoder improves generalization when very few labeled data points are available. We also compare our GE-binomial method with other positive-unlabeled learning methods never before applied to particle picking. We expect our particle picking tool, Topaz, based on CNNs trained with GE-binomial, to be an essential component of single particle cryoEM analysis and our GE-binomial objective function to be widely applicable to positive-unlabeled classification problems.

[1]  Junsong Yuan,et al.  Positive and Unlabeled Learning for Anomaly Detection with Multi-features , 2017, ACM Multimedia.

[2]  Joseph H. Davis,et al.  Addressing preferred specimen orientation in single-particle cryo-EM through tilting , 2017, Nature Methods.

[3]  Guangwen Yang,et al.  A fast method for particle picking in cryo-electron micrographs based on fast R-CNN , 2017 .

[4]  Gang Niu,et al.  Positive-Unlabeled Learning with Non-Negative Risk Estimator , 2017, NIPS.

[5]  David J. Fleet,et al.  cryoSPARC: algorithms for rapid unsupervised cryo-EM structure determination , 2017, Nature Methods.

[6]  Yanan Zhu,et al.  A deep convolutional neural network approach to single-particle recognition in cryo-electron microscopy , 2016, BMC Bioinformatics.

[7]  Tian Xia,et al.  DeepPicker: a Deep Learning Approach for Fully Automated Particle Picking in Cryo-EM , 2016, Journal of structural biology.

[8]  Ardan Patwardhan,et al.  EMPIAR: a public archive for raw electron microscopy image data , 2016, Nature Methods.

[9]  Soumith Chintala,et al.  Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[10]  N. Grigorieff,et al.  CTFFIND4: Fast and accurate defocus estimation from electron micrographs , 2015, bioRxiv.

[11]  Gang Niu,et al.  Convex Formulation for Learning from Positive and Unlabeled Data , 2015, ICML.

[12]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  P. Penczek,et al.  A Primer to Single-Particle Cryo-Electron Microscopy , 2015, Cell.

[14]  A. Cheng,et al.  2.8 Å resolution reconstruction of the Thermoplasma acidophilum 20S proteasome using cryo-electron microscopy , 2015, eLife.

[15]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[16]  Sjors H.W. Scheres,et al.  Semi-automated selection of cryo-EM particles in RELION-1.3 , 2015, Journal of structural biology.

[17]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[18]  S. Scheres,et al.  Cryo-EM structure of the Plasmodium falciparum 80S ribosome bound to the anti-protozoan drug emetine , 2014, eLife.

[19]  Michael S. Spilman,et al.  ResLog plots as an empirical metric of the quality of cryo-EM reconstructions. , 2014, Journal of structural biology.

[20]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Marin van Heel,et al.  Finding trimeric HIV-1 envelope glycoproteins in random noise , 2013 .

[22]  Brendan Borrell,et al.  Rift widens over structure of HIV’s molecular anchor , 2013, Nature.

[23]  Sriram Subramaniam,et al.  Structure of trimeric HIV-1 envelope glycoproteins , 2013, Proceedings of the National Academy of Sciences.

[24]  Youdong Mao,et al.  Molecular architecture of the uncleaved HIV-1 envelope glycoprotein trimer , 2013, Proceedings of the National Academy of Sciences.

[25]  Gilles Blanchard,et al.  Semi-Supervised Novelty Detection , 2010, J. Mach. Learn. Res..

[26]  Gideon S. Mann,et al.  Generalized Expectation Criteria for Semi-Supervised Learning with Weakly Labeled Data , 2010, J. Mach. Learn. Res..

[27]  M Radermacher,et al.  DoG Picker and TiltPicker: software tools to facilitate particle selection in single particle electron microscopy. , 2009, Journal of structural biology.

[28]  Christopher Irving,et al.  Appion: an integrated, database-driven pipeline to facilitate EM image processing. , 2009, Journal of structural biology.

[29]  Charles Elkan,et al.  Learning classifiers from only positive and unlabeled data , 2008, KDD.

[30]  Xiaoli Li,et al.  Learning from Positive and Unlabeled Examples with Different Data Distributions , 2005, ECML.

[31]  A. Roseman Particle finding in electron micrographs using a fast local correlation algorithm. , 2003, Ultramicroscopy.

[32]  J Pulokas,et al.  Leginon: an automated system for acquisition of images from vitreous ice specimens. , 2000, Journal of structural biology.

[33]  Wen Jiang,et al.  EMAN2: an extensible image processing suite for electron microscopy. , 2007, Journal of structural biology.