ApproxEye: Enabling approximate computation reuse for microrobotic computer vision

Aiming at real-life problems, microrobotic systems have gained more and more attention. However, limited achievable performance of microrobotic system prevents it from carrying out complex tasks. Current research work propose customize designs for different applications and incorporate dedicated accelerator for high energy efficiency. However, not only such techniques require significant manual effort and expertise for specified applications, but also the accelerator itself dictates unnegligible amount of chip resources. So in this paper we propose ApproxEye, a partial approximate computation reuse framework to accelerate microrobotic computer vision. Leveraging computation locality, ApproxEye reuses previous “similar” computations to reduce redundant computations. To squeeze every piece of computation reuse opportunity, ApproxEye proposes to 1) heuristically define optimal reuse granularity and 2) apply adaptive reuse requirements for different computations. Moreover, to reduce latency of computation reuse, ApproxEye tailors a parallel implemented search scheme for approximate computation reuse. Experimental results show ApproxEye could effectively exploit the potential of computation reuse and achieve 57.05% speedup on average.

[1]  Gurindar S. Sohi,et al.  Understanding the differences between value prediction and instruction reuse , 1998, Proceedings. 31st Annual ACM/IEEE International Symposium on Microarchitecture.

[2]  Jürgen Teich,et al.  Acceleration of Optical Flow Computations on Tightly-Coupled Processor Arrays , 2013 .

[3]  Hadi Esmaeilzadeh,et al.  Prediction-Based Quality Control for Approximate Accelerators , 2015 .

[4]  Thomas Brox,et al.  High Accuracy Optical Flow Estimation Based on a Theory for Warping , 2004, ECCV.

[5]  Ce Liu,et al.  Exploring new representations and applications for motion analysis , 2009 .

[6]  Gurindar S. Sohi,et al.  An empirical analysis of instruction repetition , 1998, ASPLOS VIII.

[7]  Gu-Yeon Wei,et al.  A multi-chip system optimized for insect-scale flapping-wing robots , 2015, 2015 Symposium on VLSI Circuits (VLSI Circuits).

[8]  Meng-Fan Chang,et al.  Energy-efficient non-volatile TCAM search engine design using priority-decision in memory technology for DPI , 2015, 2015 52nd ACM/EDAC/IEEE Design Automation Conference (DAC).

[9]  Rakesh Kumar,et al.  On reconfiguration-oriented approximate adder design and its application , 2013, 2013 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[10]  Wen-mei W. Hwu,et al.  Compiler-directed dynamic computation reuse: rationale and initial results , 1999, MICRO-32. Proceedings of the 32nd Annual ACM/IEEE International Symposium on Microarchitecture.

[11]  Xin He,et al.  ACR: Enabling computation reuse for approximate computing , 2016, 2016 21st Asia and South Pacific Design Automation Conference (ASP-DAC).

[12]  Youfeng Wu,et al.  Better exploration of region-level value locality with integrated computation reuse and value prediction , 2001, ISCA 2001.

[13]  Richard Szeliski,et al.  A Database and Evaluation Methodology for Optical Flow , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[14]  Timothy Sherwood,et al.  Modeling TCAM power for next generation network devices , 2006, 2006 IEEE International Symposium on Performance Analysis of Systems and Software.

[15]  Douglas L. Jones,et al.  Scalable stochastic processors , 2010, 2010 Design, Automation & Test in Europe Conference & Exhibition (DATE 2010).

[16]  K. Pagiamtzis,et al.  Content-addressable memory (CAM) circuits and architectures: a tutorial and survey , 2006, IEEE Journal of Solid-State Circuits.

[17]  Anand Raghunathan,et al.  Best-effort computing: Re-thinking parallel software and hardware , 2010, Design Automation Conference.

[18]  Mikko H. Lipasti,et al.  On the value locality of store instructions , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[19]  Scott A. Mahlke,et al.  Paraprox: pattern-based approximation for data parallel applications , 2014, ASPLOS.

[20]  M. Valero,et al.  Fuzzy memoization for floating-point multimedia applications , 2005, IEEE Transactions on Computers.

[21]  Michael S. Hsiao,et al.  Region-level approximate computation reuse for power reduction in multimedia applications , 2005, ISLPED '05. Proceedings of the 2005 International Symposium on Low Power Electronics and Design, 2005..