Spark-based parallel calculation of 3D fourier shell correlation for macromolecule structure local resolution estimation

Background Resolution estimation is the main evaluation criteria for the reconstruction of macromolecular 3D structure in the field of cryoelectron microscopy (cryo-EM). At present, there are many methods to evaluate the 3D resolution for reconstructed macromolecular structures from Single Particle Analysis (SPA) in cryo-EM and subtomogram averaging (SA) in electron cryotomography (cryo-ET). As global methods, they measure the resolution of the structure as a whole, but they are inaccurate in detecting subtle local changes of reconstruction. In order to detect the subtle changes of reconstruction of SPA and SA, a few local resolution methods are proposed. The mainstream local resolution evaluation methods are based on local Fourier shell correlation (FSC), which is computationally intensive. However, the existing resolution evaluation methods are based on multi-threading implementation on a single computer with very poor scalability. Results This paper proposes a new fine-grained 3D array partition method by key-value format in Spark. Our method first converts 3D images to key-value data (K-V). Then the K-V data is used for 3D array partitioning and data exchange in parallel. So Spark-based distributed parallel computing framework can solve the above scalability problem. In this distributed computing framework, all 3D local FSC tasks are simultaneously calculated across multiple nodes in a computer cluster. Through the calculation of experimental data, 3D local resolution evaluation algorithm based on Spark fine-grained 3D array partition has a magnitude change in computing speed compared with the mainstream FSC algorithm under the condition that the accuracy remains unchanged, and has better fault tolerance and scalability. Conclusions In this paper, we proposed a K-V format based fine-grained 3D array partition method in Spark to parallel calculating 3D FSC for getting a 3D local resolution density map. 3D local resolution density map evaluates the three-dimensional density maps reconstructed from single particle analysis and subtomogram averaging. Our proposed method can significantly increase the speed of the 3D local resolution evaluation, which is important for the efficient detection of subtle variations among reconstructed macromolecular structures.

[1]  Bhavani M. Thuraisingham,et al.  Honeypot based unauthorized data access detection in MapReduce systems , 2015, 2015 IEEE International Conference on Intelligence and Security Informatics (ISI).

[2]  Slavica Jonić,et al.  Cryo-electron Microscopy Analysis of Structurally Heterogeneous Macromolecular Complexes , 2016, Computational and structural biotechnology journal.

[3]  V. Kumar,et al.  A message passing interface to support fast data access in distributed cloud environment along with master and slave communication , 2014, Second International Conference on Current Trends In Engineering and Technology - ICCTET 2014.

[4]  W. O. Saxton,et al.  The correlation averaging of a regularly arranged bacterial cell envelope protein , 1982, Journal of microscopy.

[5]  V. Lučić,et al.  Cryo-electron tomography: The challenge of doing structural biology in situ , 2013, The Journal of cell biology.

[6]  A. Steven,et al.  One number does not fit all: mapping local variations in resolution in cryo-EM reconstructions. , 2013, Journal of structural biology.

[7]  E. Nogales,et al.  The cryo-electron microscopy structure of human transcription factor IIH , 2017, Nature.

[8]  José María Carazo,et al.  MonoRes: Automatic and Accurate Estimation of Local Resolution for Electron Microscopy Maps. , 2018, Structure.

[9]  Yuxiang Chen,et al.  Autofocused 3D classification of cryoelectron subtomograms. , 2014, Structure.

[10]  Shaoxia Chen,et al.  Prevention of overfitting in cryo-EM structure determination , 2012, Nature Methods.

[11]  Conrad C. Huang,et al.  UCSF Chimera—A visualization system for exploratory research and analysis , 2004, J. Comput. Chem..

[12]  G. Zanetti,et al.  Subtomogram averaging of COPII assemblies reveals how coat organization dictates membrane shape , 2018, Nature Communications.

[13]  Jianfeng Zhan,et al.  GraphDuo: A Dual-Model Graph Processing Framework , 2018, IEEE Access.

[14]  P. Schwander,et al.  Conformations of macromolecules and their complexes from heterogeneous datasets , 2014, Philosophical Transactions of the Royal Society B: Biological Sciences.

[15]  George Bosilca,et al.  Message Passing Interface , 2017, Encyclopedia of GIS.

[16]  M. Shatsky,et al.  A method for the alignment of heterogeneous macromolecules from electron microscopy. , 2009, Journal of structural biology.

[17]  Young-Koo Lee,et al.  Human Action Recognition Using Adaptive Local Motion Descriptor in Spark , 2017, IEEE Access.

[18]  Catherine M. Oikonomou,et al.  Cellular Electron Cryotomography: Toward Structural Biology In Situ. , 2017, Annual review of biochemistry.

[19]  Xue-wen Chen,et al.  Large-Scale Deep Belief Nets With MapReduce , 2014, IEEE Access.

[20]  M. Heel,et al.  Exact filters for general geometry three dimensional reconstruction , 1986 .

[21]  Min Xu,et al.  Fine-grained alignment of cryo-electron subtomograms based on MPI parallel optimization , 2019, BMC Bioinformatics.

[22]  Scott Shenker,et al.  Spark: Cluster Computing with Working Sets , 2010, HotCloud.

[23]  N. Grigorieff,et al.  Ab initio resolution measurement for single particle structures. , 2007, Journal of structural biology.

[24]  Chang Liu,et al.  Automatic localization and identification of mitochondria in cellular electron cryo-tomography using faster-RCNN , 2019, BMC Bioinformatics.

[25]  Hemant D. Tagare,et al.  The Local Resolution of Cryo-EM Density Maps , 2013, Nature Methods.

[26]  Yuxiang Chen,et al.  PyTom: a python-based toolbox for localization of macromolecules in cryo-electron tomograms and subtomogram analysis. , 2012, Journal of structural biology.

[27]  Pawel A Penczek,et al.  Three-dimensional spectral signal-to-noise ratio for a class of reconstruction algorithms. , 2002, Journal of structural biology.

[28]  Nirwan Ansari,et al.  Spark-based large-scale matrix inversion for big data processing , 2016, 2016 IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS).

[29]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.