A Multicore Path to Connectomics-on-Demand

The current design trend in large scale machine learning is to use distributed clusters of CPUs and GPUs with MapReduce-style programming. Some have been led to believe that this type of horizontal scaling can reduce or even eliminate the need for traditional algorithm development, careful parallelization, and performance engineering. This paper is a case study showing the contrary: that the benefits of algorithms, parallelization, and performance engineering, can sometimes be so vast that it is possible to solve "cluster-scale" problems on a single commodity multicore machine. Connectomics is an emerging area of neurobiology that uses cutting edge machine learning and image processing to extract brain connectivity graphs from electron microscopy images. It has long been assumed that the processing of connectomics data will require mass storage, farms of CPU/GPUs, and will take months (if not years) of processing time. We present a high-throughput connectomics-on-demand system that runs on a multicore machine with less than 100 cores and extracts connectomes at the terabyte per hour pace of modern electron microscopes.

[1]  Nir Shavit,et al.  Deep Tensor Convolution on Multicores , 2016, ICML.

[2]  Hayk Saribekyan,et al.  A Multi-Pass Approach to Large-Scale Connectomics , 2016, ArXiv.

[3]  Charles E. Leiserson,et al.  Executing Dynamic Data-Graph Computations Deterministically Using Chromatic Scheduling , 2016, ACM Trans. Parallel Comput..

[4]  Punam K. Saha,et al.  A survey on skeletonization algorithms and their applications , 2016, Pattern Recognit. Lett..

[5]  Eric P. Xing,et al.  GeePS: scalable deep learning on distributed GPUs with a GPU-specialized parameter server , 2016, EuroSys.

[6]  Stephen M. Plaza,et al.  Large-Scale Electron Microscopy Image Segmentation in Spark , 2016, ArXiv.

[7]  Daniel R. Berger,et al.  The Fuzzy Logic of Network Connectivity in Mouse Visual Thalamus , 2016, Cell.

[8]  Brett J. Graham,et al.  Anatomy and function of an excitatory network in the visual cortex , 2016, Nature.

[9]  Vladlen Koltun,et al.  Multi-Scale Context Aggregation by Dilated Convolutions , 2015, ICLR.

[10]  H. Sebastian Seung,et al.  ZNN -- A Fast and Scalable Algorithm for Training 3D Convolutional Networks on Multi-core and Many-Core Shared Memory Machines , 2015, 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[11]  Fabian Tschopp,et al.  Efficient convolutional neural networks for pixelwise classification on heterogeneous hardware systems , 2015, 2016 IEEE 13th International Symposium on Biomedical Imaging (ISBI).

[12]  Pieter Abbeel,et al.  Combinatorial Energy Learning for Image Segmentation , 2015, NIPS.

[13]  H. Sebastian Seung,et al.  Recursive Training of 2D-3D Convolutional Networks for Neuronal Boundary Prediction , 2015, NIPS.

[14]  William R. Gray Roncal,et al.  Saturated Reconstruction of a Volume of Neocortex , 2015, Cell.

[15]  Quan Chen,et al.  DjiNN and Tonic: DNN as a service and its implications for future warehouse scale computers , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).

[16]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[17]  Asim Kadav,et al.  MALT: distributed data-parallelism for existing ML applications , 2015, EuroSys.

[18]  A. L. Eberle,et al.  High-resolution, high-throughput imaging with a multibeam scanning electron microscope , 2015, Journal of microscopy.

[19]  Jinhyun Kim,et al.  neuTube 1.0: A New Design for Efficient Neuron Reconstruction Software Based on the SWC Format 123 , 2015, eNeuro.

[20]  Gregory D. Hager,et al.  An automated images-to-graphs framework for high resolution connectomics , 2014, Front. Neuroinform..

[21]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Anirban Chakraborty,et al.  A Context-Aware Delayed Agglomeration Framework for Electron Microscopy Segmentation , 2014, PloS one.

[23]  Amelio Vázquez Reina,et al.  Large-Scale Automatic Reconstruction of Neuronal Processes from Electron Microscopy Images , 2013, Medical Image Anal..

[24]  Trishul M. Chilimbi,et al.  Project Adam: Building an Efficient and Scalable Deep Learning Training System , 2014, OSDI.

[25]  Nir Shavit,et al.  The big data challenges of connectomics , 2014, Nature Neuroscience.

[26]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[27]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[28]  Anirban Chakraborty,et al.  A Context-aware Delayed Agglomeration Framework for EM Segmentation , 2014, ArXiv.

[29]  Anirban Chakraborty,et al.  Graph-based active learning of agglomeration (GALA): a Python library to segment 2D and 3D neuroimages , 2014, Front. Neuroinform..

[30]  Endong Wang,et al.  Intel Math Kernel Library , 2014 .

[31]  Andrew R. McKinstry-Wu,et al.  Connectome: How the Brain’s Wiring Makes Us Who We Are , 2013 .

[32]  Jianbo Shi,et al.  Machine Learning of Hierarchical Clustering to Segment 2D and 3D Images , 2013, PloS one.

[33]  Nir Shavit,et al.  NUMA-aware reader-writer locks , 2013, PPoPP '13.

[34]  Luca Maria Gambardella,et al.  Fast image scanning with deep max-pooling convolutional neural networks , 2013, 2013 IEEE International Conference on Image Processing.

[35]  Jürgen Schmidhuber,et al.  A fast learning algorithm for image segmentation with max-pooling convolutional networks , 2013, 2013 IEEE International Conference on Image Processing.

[36]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[37]  Luca Maria Gambardella,et al.  Deep Neural Networks Segment Neuronal Membranes in Electron Microscopy Images , 2012, NIPS.

[38]  Michael J. Franklin,et al.  Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing , 2012, NSDI.

[39]  Virendra J. Marathe,et al.  Lock cohorting: a general technique for designing NUMA locks , 2012, PPoPP '12.

[40]  W. Denk,et al.  The Big and the Small: Challenges of Imaging the Brain’s Circuits , 2011, Science.

[41]  Nir Shavit,et al.  Flat-combining NUMA locks , 2011, SPAA '11.

[42]  Charless C. Fowlkes,et al.  Contour Detection and Hierarchical Image Segmentation , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[43]  Scott Shenker,et al.  Spark: Cluster Computing with Working Sets , 2010, HotCloud.

[44]  Pradeep Dubey,et al.  Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU , 2010, ISCA.

[45]  Charles E. Leiserson,et al.  The Cilk++ concurrency platform , 2009, 2009 46th ACM/IEEE Design Automation Conference.

[46]  W. M. Gao,et al.  Improved 3D Thinning Algorithms for Skeleton Extraction , 2009, 2009 Digital Image Computing: Techniques and Applications.

[47]  Matteo Frigo,et al.  Reducers and other Cilk++ hyperobjects , 2009, SPAA '09.

[48]  Ronald L. Rivest,et al.  Introduction to Algorithms, third edition , 2009 .

[49]  J. Sanes,et al.  Ome sweet ome: what can the genome tell us about the connectome? , 2008, Current Opinion in Neurobiology.

[50]  Sanjay Ghemawat,et al.  MapReduce: simplified data processing on large clusters , 2008, CACM.

[51]  Mark Moir,et al.  SNZI: scalable NonZero indicators , 2007, PODC '07.

[52]  M. Meilă Comparing clusterings---an information based distance , 2007 .

[53]  Zhiling Lan,et al.  Exploit failure prediction for adaptive fault-tolerance in cluster computing , 2006, Sixth IEEE International Symposium on Cluster Computing and the Grid (CCGRID'06).

[54]  Moncef Gabbouj,et al.  Parallel watershed transformation algorithms for image segmentation , 1998, Parallel Comput..

[55]  Bradley C. Kuszmaul,et al.  Cilk: an efficient multithreaded runtime system , 1995, PPOPP '95.

[56]  Gilles Bertrand,et al.  Three-dimensional thinning algorithm using subfields , 1995, Other Conferences.

[57]  Michael L. Scott,et al.  Algorithms for scalable synchronization on shared-memory multiprocessors , 1991, TOCS.

[58]  S. Brenner,et al.  The structure of the nervous system of the nematode Caenorhabditis elegans. , 1986, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[59]  S. R. Cajal Textura del Sistema Nervioso del Hombre y de los Vertebrados, 1899–1904 , 2019 .