Memory-Oriented Design-Space Exploration of Edge-AI Hardware for XR Applications

Low-Power Edge-AI capabilities are essential for on-device extended reality (XR) applications to support the vision of Metaverse. In this work, we investigate two representative XR workloads: (i) Hand detection and (ii) Eye segmentation, for hardware design space exploration. For both applications, we train deep neural networks and analyze the impact of quantization and hardware specific bottlenecks. Through simulations, we evaluate a CPU and two systolic inference accelerator implementations. Next, we compare these hardware solutions with advanced technology nodes. The impact of integrating state-of-the-art emerging non-volatile memory technology (STT/SOT/VGSOT MRAM) into the XR-AI inference pipeline is evaluated. We found that significant energy benefits (>=24%) can be achieved for hand detection (IPS=10) and eye segmentation (IPS=0.1) by introducing non-volatile memory in the memory hierarchy for designs at 7nm node while meeting minimum IPS (inference per second). Moreover, we can realize substantial reduction in area (>=30%) owing to the small form factor of MRAM compared to traditional SRAM.

[1]  Hans Reyserhove,et al.  Real-Time Gaze Tracking with Event-Driven Eye Segmentation , 2022, 2022 IEEE Conference on Virtual Reality and 3D User Interfaces (VR).

[2]  Peter C. Ma,et al.  Ten Lessons From Three Generations Shaped Google’s TPUv4i : Industrial Product , 2021, 2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA).

[3]  G. Kar,et al.  Voltage-Gate-Assisted Spin-Orbit-Torque Magnetic Random-Access Memory for High-Density and Low-Power Embedded Applications , 2021, 2104.09599.

[4]  Satyabrata Sarangi,et al.  DeepScaleTool: A Tool for the Accurate Estimation of Technology Scaling in the Deep-Submicron Era , 2021, 2021 IEEE International Symposium on Circuits and Systems (ISCAS).

[5]  Adrian Alan Pol,et al.  Automatic heterogeneous quantization of deep neural networks for low-latency inference on the edge for particle detectors , 2020, Nature Machine Intelligence.

[6]  C. T. Gray,et al.  Simba , 2021, Commun. ACM.

[7]  Chengde Wan,et al.  MEgATrack , 2020, ACM Trans. Graph..

[8]  Sarita V. Adve,et al.  Exploring Extended Reality with ILLIXR: A New Playground for Architecture Research , 2020, ArXiv.

[9]  Vivienne Sze,et al.  Accelergy: An Architecture-Level Energy Estimation Methodology for Accelerator Designs , 2019, 2019 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[10]  Abhishek Gupta,et al.  Performance Enhancement of Edge-AI-Inference Using Commodity MRAM: IoT Case Study , 2019, 2019 IEEE 11th International Memory Workshop (IMW).

[11]  Gregory Hughes,et al.  OpenEDS: Open Eye Dataset , 2019, ArXiv.

[12]  Brucek Khailany,et al.  Timeloop: A Systematic Approach to DNN Accelerator Evaluation , 2019, 2019 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).

[13]  David Kim,et al.  The need 4 speed in real-time dense visual tracking , 2018, ACM Trans. Graph..

[14]  Mark Sandler,et al.  MobileNetV2: Inverted Residuals and Linear Bottlenecks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[15]  Shanxin Yuan,et al.  First-Person Hand Action Benchmark with RGB-D Videos and 3D Hand Pose Annotations , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[16]  V. Sze,et al.  Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks , 2016, IEEE Journal of Solid-State Circuits.

[17]  Segmentation Models , 2016, Brand Management Strategies.

[18]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[19]  Alireza Shafaei,et al.  FinCACTI: Architectural Analysis and Modeling of Caches with Deeply-Scaled FinFET Devices , 2014, 2014 IEEE Computer Society Annual Symposium on VLSI.

[20]  R. Ranica,et al.  FDSOI process/design full solutions for ultra low leakage, high speed and low voltage SRAMs , 2013, 2013 Symposium on VLSI Technology.