Mesorasi: Architecture Support for Point Cloud Analytics via Delayed-Aggregation

Point cloud analytics is poised to become a key workload on battery-powered embedded and mobile platforms in a wide range of emerging application domains, such as autonomous driving, robotics, and augmented reality, where efficiency is paramount. This paper proposes Mesorasi, an algorithm-architecture co-designed system that simultaneously improves the performance and energy efficiency of point cloud analytics while retaining its accuracy.Our extensive characterizations of state-of-the-art point cloud algorithms show that, while structurally reminiscent of convolutional neural networks (CNNs), point cloud algorithms exhibit inherent compute and memory inefficiencies due to the unique characteristics of point cloud data. We propose delayed-aggregation, a new algorithmic primitive for building efficient point cloud algorithms. Delayed-aggregation hides the performance bottlenecks and reduces the compute and memory redundancies by exploiting the approximately distributive property of key operations in point cloud algorithms. Delayed-aggregation let point cloud algorithms achieve 1.6× speedup and 51.1% energy reduction on a mobile GPU while retaining the accuracy (-0.9% loss to 1.2% gains). To maximize the algorithmic benefits, we propose minor extensions to contemporary CNN accelerators, which can be integrated into a mobile Systems-on-a-Chip (SoC) without modifying other SoC components. With additional hardware support, Mesorasi achieves up to 3.6× speedup.

[1]  Mahmut T. Kandemir,et al.  Race-To-Sleep + Content Caching + Display Caching: A Recipe for Energy-efficient Video Streaming on Handhelds , 2017, 2017 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[2]  Jiwen Lu,et al.  DensePoint: Learning Densely Contextual Representation for Efficient Point Cloud Processing , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[3]  Pierre Alliez,et al.  Culture 3D Cloud: A Cloud Computing Platform for 3D Scanning, Documentation, Preservation and Dissemination of Cultural Heritage , 2017, ERCIM News.

[4]  David A. Patterson,et al.  In-datacenter performance analysis of a tensor processing unit , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[5]  Dongrui Fan,et al.  HyGCN: A GCN Accelerator with Hybrid Architecture , 2020, 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[6]  Chao-Tsung Huang,et al.  eCNN: A Block-Based and Highly-Parallel CNN Accelerator for Edge Inference , 2019, MICRO.

[7]  Jan Eric Lenssen,et al.  Fast Graph Representation Learning with PyTorch Geometric , 2019, ArXiv.

[8]  Fuxin Li,et al.  PointConv: Deep Convolutional Networks on 3D Point Clouds , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Fabian Gieseke,et al.  Buffer k-d Trees: Processing Massive Nearest Neighbor Queries on GPUs , 2014, ICML.

[10]  Mahmut T. Kandemir,et al.  VIP: Virtualizing IP chains on handheld platforms , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).

[11]  Nam Sung Kim,et al.  GANAX: A Unified MIMD-SIMD Acceleration for Generative Adversarial Networks , 2018, 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA).

[12]  Leonidas J. Guibas,et al.  Frustum PointNets for 3D Object Detection from RGB-D Data , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[13]  Yue Wang,et al.  Dynamic Graph CNN for Learning on Point Clouds , 2018, ACM Trans. Graph..

[14]  Nico Blodow,et al.  Fast Point Feature Histograms (FPFH) for 3D registration , 2009, 2009 IEEE International Conference on Robotics and Automation.

[15]  Yuhao Zhu,et al.  Tigris: Architecture and Algorithms for 3D Perception in Point Clouds , 2019, MICRO.

[16]  Suren Jayasuriya,et al.  EVA²: Exploiting Temporal Redundancy in Live Computer Vision , 2018, 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA).

[17]  Mahmut T. Kandemir,et al.  Distilling the Essence of Raw Video to Reduce Memory Usage and Energy at Edge Devices , 2019, MICRO.

[18]  Lingfan Yu,et al.  Deep Graph Library: A Graph-Centric, Highly-Performant Package for Graph Neural Networks. , 2019 .

[19]  Cyrill Stachniss,et al.  SemanticKITTI: A Dataset for Semantic Scene Understanding of LiDAR Sequences , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[20]  Joel Emer,et al.  Eyeriss: a spatial architecture for energy-efficient dataflow for convolutional neural networks , 2016, CARN.

[21]  Andreas Nüchter,et al.  GPU-Accelerated Nearest Neighbor Search for 3D Registration , 2009, ICVS.

[22]  Alexander J. Smola,et al.  Deep Graph Library: Towards Efficient and Scalable Deep Learning on Graphs , 2019, ArXiv.

[23]  Jose-Maria Arnau,et al.  Computation Reuse in DNNs by Exploiting Input Similarity , 2018, 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA).

[24]  Yuhao Zhu,et al.  ASV: Accelerated Stereo Vision System , 2019, MICRO.

[25]  Mostafa Mahmoud,et al.  Diffy: a Déjà vu-Free Differential Deep Neural Network Accelerator , 2018, 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[26]  Matthew Mattina,et al.  Euphrates: Algorithm-SoC Co-Design for Low-Power Mobile Continuous Vision , 2018, 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA).

[27]  Markus Gross,et al.  Point-Based Graphics , 2007 .

[28]  Hideharu Amano,et al.  An FPGA Acceleration for the Kd-tree Search in Photon Mapping , 2013, ARC.

[29]  Song Han,et al.  Point-Voxel CNN for Efficient 3D Deep Learning , 2019, NeurIPS.

[30]  Jianxiong Xiao,et al.  3D ShapeNets: A deep representation for volumetric shapes , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Christoforos E. Kozyrakis,et al.  TETRIS: Scalable and Efficient Neural Network Acceleration with 3D Memory , 2017, ASPLOS.

[32]  Mahmut T. Kandemir,et al.  Short-Circuiting Memory Traffic in Handheld Platforms , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.

[33]  Amrita Mazumdar,et al.  Exploring computation-communication tradeoffs in camera systems , 2017, 2017 IEEE International Symposium on Workload Characterization (IISWC).

[34]  George A. Constantinides,et al.  FPGA-based K-means clustering using tree-based data structures , 2013, 2013 23rd International Conference on Field programmable Logic and Applications.

[35]  Mark Whitty,et al.  Autonomous Navigation using a Real-Time 3 D Point Cloud , 2010 .

[36]  David Harris,et al.  CMOS VLSI Design: A Circuits and Systems Perspective , 2004 .

[37]  Marc Levoy,et al.  The Use of Points as a Display Primitive , 2000 .

[38]  Federico Tombari,et al.  Unique Signatures of Histograms for Local Surface Description , 2010, ECCV.

[39]  Andreas Geiger,et al.  Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[40]  Leonidas J. Guibas,et al.  ShapeNet: An Information-Rich 3D Model Repository , 2015, ArXiv.

[41]  Leonidas J. Guibas,et al.  PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space , 2017, NIPS.

[42]  Markus H. Gross,et al.  A hardware processing unit for point sets , 2008, GH '08.

[43]  Yuhao Zhu,et al.  Energy-Efficient Video Processing for Virtual Reality , 2020, IEEE Micro.

[44]  Jie Cheng,et al.  Programming Massively Parallel Processors. A Hands-on Approach , 2010, Scalable Comput. Pract. Exp..

[45]  Jonathan Dyssel Stets,et al.  Visualization and labeling of point clouds in virtual reality , 2017, SIGGRAPH ASIA.

[46]  Ming Hao,et al.  Linked Dynamic Graph CNN: Learning on Point Cloud via Linking Hierarchical Features , 2019, ArXiv.

[47]  Felix Heide,et al.  IDEAL: Image DEnoising AcceLerator , 2017, 2017 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[48]  Mahmut T. Kandemir,et al.  Déjà View: Spatio-Temporal Compute Reuse for‘ Energy-Efficient 360° VR Video Streaming , 2020, 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA).

[49]  Leonidas J. Guibas,et al.  Volumetric and Multi-view CNNs for Object Classification on 3D Data , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[50]  Marc Levoy,et al.  QSplat: a multiresolution point rendering system for large meshes , 2000, SIGGRAPH.

[51]  Matthias Zwicker,et al.  Surfels: surface elements as rendering primitives , 2000, SIGGRAPH.

[52]  Yuhao Zhu,et al.  Energy-Efficient Video Processing for Virtual Reality , 2019, 2019 ACM/IEEE 46th Annual International Symposium on Computer Architecture (ISCA).

[53]  Gernot Riegler,et al.  OctNet: Learning Deep 3D Representations at High Resolutions , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[54]  Nikos Komodakis,et al.  Dynamic Edge-Conditioned Filters in Convolutional Neural Networks on Graphs , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).