Towards Ubiquitous Low-power Image Processing Platforms

The development of power-efficient solutions gives new embedded products the ability to analyse images and thereby brings more intelligence to embedded systems – providing more and better services of higher quality as well as advanced capabilities such as self-adaptation and autonomy. This will allow cars to drive safer, medical devices to assist surgeons, and autonomous drones to find people that have gotten lost. For small-series products, one needs to find an embedded platform that provides enough performance, does not exceed the target price, and has sufficiently low power consumption. As these requirements are typically conflicting, image processing engineers spend considerable time identifying the best possible trade-off for their algorithm implementation on the chosen platform. Providing a common platform that allows the efficient implementation of image processing systems across diverse application domains – a key objective of our Tulipp project – requires a solid understanding of the constraints and challenges of each domain. In this paper, we report the key challenges we identified within the medical, Unmanned Aerial Vehicle (UAV), and automotive domains to aid the community in developing the next generation of embedded image processing systems.

[1]  Stanley R. Sternberg Parallel architectures for image processing , 1979, COMPSAC.

[2]  Keiichi Abe,et al.  New Fusion Operations for Digitized Binary Images and Their Applications , 1985, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[4]  John F. Canny,et al.  A Computational Approach to Edge Detection , 1986, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  R. W. Brodersen,et al.  Architectures and design techniques for real-time image-processing IC's , 1987 .

[6]  Christopher G. Harris,et al.  A Combined Corner and Edge Detector , 1988, Alvey Vision Conference.

[7]  E. Bilgen,et al.  Effects of Heat Intensity, Size, and Position of the Components on Temperature Distribution Within an Electronic PCB Enclosure , 1990 .

[8]  Stephen W. Marshall,et al.  Factors affecting work-related injury among forestry workers: A review , 1993 .

[9]  Ramin Zabih,et al.  Non-parametric Local Transforms for Computing Visual Correspondence , 1994, ECCV.

[10]  M. Horowitz,et al.  Low-power digital design , 1994, Proceedings of 1994 IEEE Symposium on Low Power Electronics.

[11]  P.M. Athanas,et al.  Real-Time Image Processing on a Custom Computing Platform , 1995, Computer.

[12]  Richard W. Conners,et al.  A MOdular and Reprogrammable Real-time Processing Hardware, MORRPH , 1995, Proceedings IEEE Symposium on FPGAs for Custom Computing Machines.

[13]  Hugo De Man,et al.  Power exploration for data dominated video applications , 1996, ISLPED '96.

[14]  Gordon J. Brebner,et al.  A Virtual Hardware Operating System for the Xilinx XC6200 , 1996, FPL.

[15]  Daniel Svozil,et al.  Introduction to multi-layer feed-forward neural networks , 1997 .

[16]  H. De Man,et al.  System-level power exploration for MPEG-2 decoder on embedded cores: a systematic approach , 1997, 1997 IEEE Workshop on Signal Processing Systems. SiPS 97 Design and Implementation formerly VLSI Signal Processing.

[17]  F. Gougeon Robotic vision in a regenerating forest environment , 1997 .

[18]  David Heckerman,et al.  Models and Selection Criteria for Regression and Classification , 1997, UAI.

[19]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[20]  L. Dagum,et al.  OpenMP: an industry standard API for shared-memory programming , 1998 .

[21]  M. F. Bowen Handel-c language reference manual , 1998 .

[22]  J.-Y. Bouguet,et al.  Pyramidal implementation of the lucas kanade feature tracker , 1999 .

[23]  Yoav Freund,et al.  A Short Introduction to Boosting , 1999 .

[24]  J. Kruger,et al.  Unskilled and unaware of it: how difficulties in recognizing one's own incompetence lead to inflated self-assessments. , 1999, Journal of personality and social psychology.

[25]  Zhengyou Zhang,et al.  A Flexible New Technique for Camera Calibration , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[26]  Giovanni Ulivi,et al.  An outdoor navigation system using GPS and inertial platform , 2001, 2001 IEEE/ASME International Conference on Advanced Intelligent Mechatronics. Proceedings (Cat. No.01TH8556).

[27]  Andrew D. Back,et al.  A spiking neural network architecture for nonlinear function approximation , 2001, Neural Networks.

[28]  Roland Siegwart,et al.  Innovative design for wheeled locomotion in rough terrain , 2002, Robotics Auton. Syst..

[29]  R J Cook,et al.  Video‐rate confocal endoscopy , 2002, Journal of microscopy.

[30]  Mahmut T. Kandemir,et al.  Leakage Current: Moore's Law Meets Static Power , 2003, Computer.

[31]  Eugene M. Izhikevich,et al.  Simple model of spiking neurons , 2003, IEEE Trans. Neural Networks.

[32]  Gunnar Farnebäck,et al.  Two-Frame Motion Estimation Based on Polynomial Expansion , 2003, SCIA.

[33]  Lloyd W. Massengill,et al.  Basic mechanisms and modeling of single-event upset in digital microelectronics , 2003 .

[34]  F. Ghozzi,et al.  Hardware platform design for real-time video applications , 2004, Proceedings. The 16th International Conference on Microelectronics, 2004. ICM 2004..

[35]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[36]  Sei-Wang Chen,et al.  Video stabilization for a camcorder mounted on a moving vehicle , 2004, IEEE Transactions on Vehicular Technology.

[37]  Robert Günzel,et al.  From TLM to FPGA: rapid prototyping with SystemC and transaction level modeling , 2005, Proceedings. 2005 IEEE International Conference on Field-Programmable Technology, 2005..

[38]  Stefan Hinz,et al.  Fast and subpixel precise blob detection and attribution , 2005, IEEE International Conference on Image Processing 2005.

[39]  Alejandro Linares-Barranco,et al.  Test Infrastructure for Address-Event-Representation Communications , 2005, IWANN.

[40]  Kunle Olukotun,et al.  The Future of Microprocessors , 2005, ACM Queue.

[41]  Jocelyn Sérot,et al.  Embedded Early Vision systems: implementation proposal and Hardware Architecture , 2005 .

[42]  James E. Smith,et al.  Virtual machines - versatile platforms for systems and processes , 2005 .

[43]  Francesco Regazzoni,et al.  Hardware/software partitioning of operating systems: a behavioral synthesis approach , 2006, ACM Great Lakes Symposium on VLSI.

[44]  Alonzo Kelly,et al.  Toward Reliable Off Road Autonomous Vehicles Operating in Challenging Environments , 2006, Int. J. Robotics Res..

[45]  Javier Díaz,et al.  FPGA-based real-time optical-flow system , 2006, IEEE Transactions on Circuits and Systems for Video Technology.

[46]  Jim Stevens,et al.  Hthreads: A Computational Model for Reconfigurable Devices , 2006, 2006 International Conference on Field Programmable Logic and Applications.

[47]  Miriam Leeser,et al.  Automatic Sliding Window Operation Optimization for FPGA-Based , 2006, 2006 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines.

[48]  Tom Drummond,et al.  Machine Learning for High-Speed Corner Detection , 2006, ECCV.

[49]  H. Hirschmüller Ieee Transactions on Pattern Analysis and Machine Intelligence 1 Stereo Processing by Semi-global Matching and Mutual Information , 2022 .

[50]  Francky Catthoor,et al.  Storage Estimation and Design Space Exploration Methodologies for the Memory Management of Signal Processing Applications , 2008, J. Signal Process. Syst..

[51]  Steven S. Lumetta,et al.  HybridOS: runtime support for reconfigurable accelerators , 2008, FPGA '08.

[52]  Jan-Michael Frahm,et al.  A Comparative Analysis of RANSAC Techniques Leading to Adaptive Real-Time Random Sample Consensus , 2008, ECCV.

[53]  Tobias Delbrück,et al.  Frame-free dynamic digital vision , 2008 .

[54]  Christophe Clienti,et al.  A system on chip dedicated to pipeline neighborhood processing for Mathematical Morphology , 2008, 2008 16th European Signal Processing Conference.

[55]  Ines Ernst,et al.  Mutual Information Based Semi-Global Stereo Matching on the GPU , 2008, ISVC.

[56]  Stephen Neuendorffer,et al.  Demystifying the Lucas-Kanade Optical Flow Algorithm with Vivado HLS , 2009 .

[57]  Isaac N. Bankman,et al.  Handbook of medical image processing and analysis , 2009 .

[58]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[59]  Sanjit A. Seshia,et al.  Introduction to Embedded Systems , 2009 .

[60]  Srihari Cadambi,et al.  A Massively Parallel Coprocessor for Convolutional Neural Networks , 2009, 2009 20th IEEE International Conference on Application-specific Systems, Architectures and Processors.

[61]  Christopher Hunt,et al.  Notes on the OpenSURF Library , 2009 .

[62]  Magnus Jahre,et al.  A Quantitative Study of Memory System Interference in Chip Multiprocessor Architectures , 2009, 2009 11th IEEE International Conference on High Performance Computing and Communications.

[63]  Raj Seelam I/O Design Flexibility with the FPGA Mezzanine Card (FMC) , 2009 .

[64]  Adrian Park,et al.  Designing Modular Hardware Accelerators in C with ROCCC 2.0 , 2010, 2010 18th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines.

[65]  John E. Stone,et al.  OpenCL: A Parallel Programming Standard for Heterogeneous Computing Systems , 2010, Computing in Science & Engineering.

[66]  Vincent Lepetit,et al.  BRIEF: Binary Robust Independent Elementary Features , 2010, ECCV.

[67]  Angel Jiménez-Fernandez,et al.  On the AER convolution processors for FPGA , 2010, Proceedings of 2010 IEEE International Symposium on Circuits and Systems.

[68]  Peter Pirsch,et al.  Real-time stereo vision system using semi-global matching disparity estimation: Architecture and FPGA-implementation , 2010, 2010 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation.

[69]  Bertil Schmidt,et al.  Bioinformatics: High Performance Parallel Computer Architectures , 2010 .

[70]  Eugene C Lin,et al.  Radiation risk from medical imaging. , 2010, Mayo Clinic proceedings.

[71]  Sven Behnke,et al.  Evaluation of Pooling Operations in Convolutional Architectures for Object Recognition , 2010, ICANN.

[72]  Maki K. Habib,et al.  Robot-Assisted Risky Intervention, Search, Rescue and Environmental Surveillance , 2010 .

[73]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[74]  Tom Drummond,et al.  Faster and Better: A Machine Learning Approach to Corner Detection , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[75]  K Lakshmanan,et al.  Scheduling Parallel Real-Time Tasks on Multi-core Processors , 2010, 2010 31st IEEE Real-Time Systems Symposium.

[76]  Lesley Shannon,et al.  FUSE: Front-End User Framework for O/S Abstraction of Hardware Accelerators , 2011, 2011 IEEE 19th Annual International Symposium on Field-Programmable Custom Computing Machines.

[77]  Jürgen Becker,et al.  RAMPSoCVM: Runtime Support and Hardware Virtualization for a Runtime Adaptive MPSoC , 2011, 2011 21st International Conference on Field Programmable Logic and Applications.

[78]  Sejung Yang,et al.  A Novel 3-D Color Histogram Equalization Method With Uniform 1-D Gray Scale Histogram , 2011, IEEE Transactions on Image Processing.

[79]  Jason Helge Anderson,et al.  LegUp: high-level synthesis for FPGA-based processor/accelerator systems , 2011, FPGA '11.

[80]  Jürgen Becker,et al.  Operating System for Runtime Reconfigurable Multiprocessor Systems , 2011, Int. J. Reconfigurable Comput..

[81]  Benoît Miramond,et al.  Dataflow programming model for reconfigurable computing , 2011, 6th International Workshop on Reconfigurable Communication-Centric Systems-on-Chip (ReCoSoC).

[82]  Andrew J. Davison,et al.  DTAM: Dense tracking and mapping in real-time , 2011, 2011 International Conference on Computer Vision.

[83]  Magnus Jahre,et al.  A High Performance Adaptive Miss Handling Architecture for Chip Multiprocessors , 2011, Trans. High Perform. Embed. Archit. Compil..

[84]  Roland Siegwart,et al.  BRISK: Binary Robust invariant scalable keypoints , 2011, 2011 International Conference on Computer Vision.

[85]  Gary R. Bradski,et al.  ORB: An efficient alternative to SIFT or SURF , 2011, 2011 International Conference on Computer Vision.

[86]  Wayne Luk,et al.  Power profiling and optimization for heterogeneous multi-core systems , 2011, CARN.

[87]  Saurabh Bagchi,et al.  Aveksha: a hardware-software approach for non-intrusive tracing and profiling of wireless embedded systems , 2011, SenSys.

[88]  Shuvra S. Bhattacharyya,et al.  Dataflow-based Design and Implementation of Image Processing Applications , 2011 .

[89]  Marco Platzner,et al.  Memory Virtualization for Multithreaded Reconfigurable Hardware , 2011, 2011 21st International Conference on Field Programmable Logic and Applications.

[90]  John Wawrzynek,et al.  Chisel: Constructing hardware in a Scala embedded language , 2012, DAC Design Automation Conference 2012.

[91]  Ulrik Pagh Schultz,et al.  HartOS - A hardware implemented RTOS for hard real-time applications , 2012, PDeS.

[92]  Greg Brown,et al.  A performance and energy comparison of FPGAs, GPUs, and multicores for sliding-window applications , 2012, FPGA '12.

[93]  Efraim Rotem,et al.  Power-Management Architecture of the Intel Microarchitecture Code-Named Sandy Bridge , 2012, IEEE Micro.

[94]  P. Strobl,et al.  Comprehensive Monitoring of Wildfires in Europe: The European Forest Fire Information System (EFFIS) , 2012 .

[95]  I. Bahri,et al.  HW-SW Real-Time Operating system for AC drive applications , 2012, International Symposium on Power Electronics Power Electronics, Electrical Drives, Automation and Motion.

[96]  Christopher D. Gill,et al.  Improving System Predictability and Performance via Hardware Accelerated Data Structures , 2012, ICCS.

[97]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[98]  Kang G. Shin,et al.  Profiling Software for Energy Consumption , 2012, 2012 IEEE International Conference on Green Computing and Communications.

[99]  Gilberto Ochoa Ruiz A high-level methodology for automatically generating dynamically reconfigurable systems using IP-XACT and the UML MARTE profile. (Méthodologie de conception de haut niveau pour la génération automatique des systèmes dynamiquement reconfigurables en utilisant IP-XACT et le profil UML MARTE) , 2013 .

[100]  Hiroaki Takada,et al.  Rainbow: An Operating System for Software-Hardware Multitasking on Dynamically Partially Reconfigurable FPGAs , 2013, Int. J. Reconfigurable Comput..

[101]  Johannes Stallkamp,et al.  Real-time stereo vision: Optimizing Semi-Global Matching , 2013, 2013 IEEE Intelligent Vehicles Symposium (IV).

[102]  Jason Helge Anderson,et al.  LegUp: An open-source high-level synthesis tool for FPGA-based processor/accelerator systems , 2013, TECS.

[103]  Paul H. J. Kelly,et al.  SLAM++: Simultaneous Localisation and Mapping at the Level of Objects , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[104]  Ryan Kastner,et al.  RIFFA 2.0: A reusable integration framework for FPGA accelerators , 2013, 2013 23rd International Conference on Field programmable Logic and Applications.

[105]  Yaoqin Xie,et al.  Feature and Contrast Enhancement of Mammographic Image Based on Multiscale Analysis and Morphology , 2013, 2013 IEEE International Conference on Information and Automation (ICIA).

[106]  David F. Bacon,et al.  FPGA programming for the masses , 2013, CACM.

[107]  Fabrizio Ferrandi,et al.  Bambu: A modular framework for the high level synthesis of memory-intensive applications , 2013, 2013 23rd International Conference on Field programmable Logic and Applications.

[108]  Frédo Durand,et al.  Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines , 2013, PLDI 2013.

[109]  Luka Daoud,et al.  A Survey of High Level Synthesis Languages, Tools, and Compilers for Reconfigurable High Performance Computing , 2013, ICSS.

[110]  Enrique S. Quintana-Ortí,et al.  An Integrated Framework for Power-Performance Analysis of Parallel Scientific Workloads , 2013 .

[111]  Sanjay Misra,et al.  Reconfiguration approaches in Wireless Sensor Network: Issues and challenges , 2013, 2013 IEEE International Conference on Emerging & Sustainable Technologies for Power & ICT in a Developing Society (NIGERCON).

[112]  Holger Blume,et al.  Parallel implementation of real-time semi-global matching on embedded multi-core architectures , 2013, 2013 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS).

[113]  Raúl Rojas,et al.  Large scale Semi-Global Matching on the CPU , 2014, 2014 IEEE Intelligent Vehicles Symposium Proceedings.

[114]  Jonathan W. Valvano Embedded Systems - Shape The World , 2014 .

[115]  Celeste O. A. Coelho,et al.  A look at forest fires in Portugal: technical, institutional, and social perceptions , 2014 .

[116]  Magnus Jahre,et al.  An energy efficient column-major backend for FPGA SpMV accelerators , 2014, 2014 IEEE 32nd International Conference on Computer Design (ICCD).

[117]  Pat Hanrahan,et al.  Darkroom , 2014, ACM Trans. Graph..

[118]  Daniel Cremers,et al.  LSD-SLAM: Large-Scale Direct Monocular SLAM , 2014, ECCV.

[119]  Jürgen Teich,et al.  Code generation from a domain-specific language for C-based HLS of hardware accelerators , 2014, 2014 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS).

[120]  Marco Platzner,et al.  ReconOS: An Operating System Approach for Reconfigurable Computing , 2014, IEEE Micro.

[121]  Jorge Pereira,et al.  Co-Designed FreeRTOS Deployed on FPGA , 2014, 2014 Brazilian Symposium on Computing Systems Engineering.

[122]  Uday Bondhugula,et al.  PolyMage: Automatic Optimization for Image Processing Pipelines , 2015, ASPLOS.

[123]  Alan D. George,et al.  Comparative analysis of OpenCL vs. HDL with image-processing kernels on Stratix-V FPGA , 2015, 2015 IEEE 26th International Conference on Application-specific Systems, Architectures and Processors (ASAP).

[124]  Kunle Olukotun,et al.  Generating Configurable Hardware from Parallel Patterns , 2015, ASPLOS.

[125]  Wolfram Burgard,et al.  Traversability analysis for mobile robots in outdoor environments: A semi-supervised learning approach based on 3D-lidar data , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[126]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[127]  J. M. M. Montiel,et al.  ORB-SLAM: A Versatile and Accurate Monocular SLAM System , 2015, IEEE Transactions on Robotics.

[128]  Jorge Cabral,et al.  Towards an FPGA-based edge device for the Internet of Things , 2015, 2015 IEEE 20th Conference on Emerging Technologies & Factory Automation (ETFA).

[129]  Dionisios N. Pnevmatikatos,et al.  Hardware Task Scheduling for Partially Reconfigurable FPGAs , 2015, ARC.

[130]  Yu Wang,et al.  Real-Time High-Quality Stereo Vision System in FPGA , 2013, IEEE transactions on circuits and systems for video technology (Print).

[131]  Piotr Bialas,et al.  Benchmarking the Cost of Thread Divergence in CUDA , 2015, PPAM.

[132]  Yoshua Bengio,et al.  BinaryConnect: Training Deep Neural Networks with binary weights during propagations , 2015, NIPS.

[133]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[134]  A. Bab-Hadiashar,et al.  An Overview to Visual Odometry and Visual SLAM: Applications to Mobile Robotics , 2015 .

[135]  Andreas Traber,et al.  Preemptive Hardware Multitasking in ReconOS , 2015, ARC.

[136]  Florent de Dinechin,et al.  Hardware Implementations of Fixed-Point Atan2 , 2015, 2015 IEEE 22nd Symposium on Computer Arithmetic.

[137]  Matthew French,et al.  A unified hardware/software MPSoC system construction and run-time framework , 2015, 2015 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[138]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[139]  Luca Benini,et al.  Optimizing memory bandwidth exploitation for OpenVX applications on embedded many-core accelerators , 2015, Journal of Real-Time Image Processing.

[140]  Timo Aila,et al.  Pruning Convolutional Neural Networks for Resource Efficient Transfer Learning , 2016, ArXiv.

[141]  Daniel Rueckert,et al.  Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[142]  Suhaib A. Fahmy,et al.  Mapping for Maximum Performance on FPGA DSP Blocks , 2016, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[143]  Pat Hanrahan,et al.  Rigel , 2016, ACM Trans. Graph..

[144]  Kunle Olukotun,et al.  Automatic Generation of Efficient Accelerators for Reconfigurable Hardware , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[145]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[146]  Grzegorz Bieszczad SoC-FPGA embedded system for real-time thermal image processing , 2016, 2016 MIXDES - 23rd International Conference Mixed Design of Integrated Circuits and Systems.

[147]  Sachin S. Talathi,et al.  Fixed Point Quantization of Deep Convolutional Networks , 2015, ICML.

[148]  Bernd Klauer,et al.  Operating System Concepts for Reconfigurable Computing: Review and Survey , 2016, Int. J. Reconfigurable Comput..

[149]  Wei Zhang,et al.  A performance analysis framework for optimizing OpenCL applications on FPGAs , 2016, 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[150]  Diana Göhringer,et al.  Enabling dynamic and partial reconfiguration in Xilinx SDSoC , 2016, 2016 International Conference on ReConFigurable Computing and FPGAs (ReConFig).

[151]  Fabien Marty,et al.  TULIPP: Towards ubiquitous low-power image processing platforms , 2016, 2016 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation (SAMOS).

[152]  Michael Milford,et al.  Supervised and Unsupervised Linear Learning Techniques for Visual Place Recognition in Changing Environments , 2016, IEEE Transactions on Robotics.

[153]  Gary R. Bradski,et al.  Learning OpenCV 3: Computer Vision in C++ with the OpenCV Library , 2016 .

[154]  Kari Pulli,et al.  OpenVX: a framework for accelerating computer vision , 2016, SIGGRAPH ASIA Courses.

[155]  Vamsi Boppana,et al.  A 16-nm Multiprocessing System-on-Chip Field-Programmable Gate Array Platform , 2016, IEEE Micro.

[156]  Diana Göhringer,et al.  LinROS: A Linux-Based Runtime System for Reconfigurable MPSoCs , 2016, 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).

[157]  Ali Farhadi,et al.  XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks , 2016, ECCV.

[158]  Mats Brorsson,et al.  Grain graphs: OpenMP performance analysis made easy , 2016, PPoPP.

[159]  Magnus Jahre,et al.  Efficient control flow restructuring for GPUs , 2016, 2016 International Conference on High Performance Computing & Simulation (HPCS).

[160]  M. S. Ali,et al.  Hardware Support for Adaptive Task Scheduler in RTOS , 2016 .

[161]  Marco Platzner,et al.  Programming models for reconfigurable manycore systems , 2016, 2016 11th International Symposium on Reconfigurable Communication-centric Systems-on-Chip (ReCoSoC).

[162]  Andrew S. Cassidy,et al.  Conversion of artificial recurrent neural networks to spiking neural networks for low-power neuromorphic hardware , 2016, 2016 IEEE International Conference on Rebooting Computing (ICRC).

[163]  Jason Cong,et al.  Caffeine: Towards uniformed representation and acceleration for deep convolutional neural networks , 2016, 2016 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[164]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[165]  Chandra Sekar,et al.  Tutorial T7: Designing with Xilinx SDSoC , 2017, 2017 30th International Conference on VLSI Design and 2017 16th International Conference on Embedded Systems (VLSID).

[166]  Christof Koch,et al.  Generalized leaky integrate-and-fire models classify multiple neuron types , 2017, Nature Communications.

[167]  Qining Wang,et al.  A Real-Time Intent Recognition System Based on SoC-FPGA for Robotic Transtibial Prosthesis , 2017, ICIRA.

[168]  Federico Tombari,et al.  CNN-SLAM: Real-Time Dense Monocular SLAM with Learned Depth Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[169]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[170]  Bo Chen,et al.  MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[171]  Philip Heng Wai Leong,et al.  FINN: A Framework for Fast, Scalable Binarized Neural Network Inference , 2016, FPGA.

[172]  Diana Göhringer,et al.  Exploration of OpenCL for FPGAs using SDAccel and comparison to GPUs and multicore CPUs , 2017, 2017 27th International Conference on Field Programmable Logic and Applications (FPL).

[173]  Martin Schoeberl,et al.  A Controller for Dynamic Partial Reconfiguration in FPGA-Based Real-Time Systems , 2017, 2017 IEEE 20th International Symposium on Real-Time Distributed Computing (ISORC).

[174]  Chunming Li,et al.  Global and Local Information Based Deep Network for Skin Lesion Segmentation , 2017, ArXiv.

[175]  Vijayalakshmi Srinivasan,et al.  Needle: Leveraging Program Analysis to Analyze and Extract Accelerators from Whole Programs , 2017, 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[176]  Sándor P. Fekete,et al.  Resource-efficient dynamic partial reconfiguration on FPGAs for space instruments , 2017, 2017 NASA/ESA Conference on Adaptive Hardware and Systems (AHS).

[177]  Yurong Liu,et al.  A survey of deep neural network architectures and their applications , 2017, Neurocomputing.

[178]  Wonyong Sung,et al.  Structured Pruning of Deep Convolutional Neural Networks , 2015, ACM J. Emerg. Technol. Comput. Syst..

[179]  Bo Yu,et al.  FPGA-based ORB feature extraction for real-time visual SLAM , 2017, 2017 International Conference on Field Programmable Technology (ICFPT).

[180]  Juan D. Tardós,et al.  ORB-SLAM2: An Open-Source SLAM System for Monocular, Stereo, and RGB-D Cameras , 2016, IEEE Transactions on Robotics.

[181]  Xi Chen,et al.  FP-DNN: An Automated Framework for Mapping Deep Neural Networks onto FPGAs with RTL-HLS Hybrid Templates , 2017, 2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM).

[182]  Sei Ikeda,et al.  Visual SLAM algorithms: a survey from 2010 to 2016 , 2017, IPSJ Transactions on Computer Vision and Applications.

[183]  Xudong Jiang,et al.  Aggregating Deep Convolutional Features for Melanoma Recognition in Dermoscopy Images , 2017, MLMI@MICCAI.

[184]  Eriko Nurvitadhi,et al.  Can FPGAs Beat GPUs in Accelerating Next-Generation Deep Neural Networks? , 2017, FPGA.

[185]  Shih-Chii Liu,et al.  Conversion of Continuous-Valued Deep Networks to Efficient Event-Driven Networks for Image Classification , 2017, Front. Neurosci..

[186]  I. El Hajjouji,et al.  FPGA-based implementation of optical flow algorithm , 2017, 2017 International Conference on Electrical and Information Technologies (ICEIT).

[187]  Ananya Muddukrishna,et al.  Extending OMPT to Support Grain Graphs , 2017, IWOMP.

[188]  Ananya Muddukrishna,et al.  Supporting Utilities for Heterogeneous Embedded Image Processing Platforms (STHEM): An Overview , 2018, ARC.

[189]  James C. Hoe,et al.  Time-Shared Execution of Realtime Computer Vision Pipelines by Dynamic Partial Reconfiguration , 2018, 2018 28th International Conference on Field Programmable Logic and Applications (FPL).

[190]  Yu Wang,et al.  Angel-Eye: A Complete Design Flow for Mapping CNN Onto Embedded FPGA , 2018, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[191]  Matti Siekkinen,et al.  Latency and throughput characterization of convolutional neural networks for mobile computer vision , 2018, MMSys.

[192]  Martin Margala,et al.  Exploration of Low Numeric Precision Deep Learning Inference Using Intel® FPGAs , 2018, 2018 IEEE 26th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM).

[193]  Guy Pe'er,et al.  Agricultural policy can reduce wildfires , 2018, Science.

[194]  Martin C. Herbordt,et al.  Real-time data analysis for medical diagnosis using FPGA-accelerated neural networks , 2018, BMC Bioinformatics.

[195]  Yingwei Luo,et al.  Get Out of the Valley: Power-Efficient Address Mapping for GPUs , 2018, 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA).

[196]  Tughrul Arslan,et al.  R3TOS-Based Integrated Modular Space Avionics for On-Board Real-Time Data Processing , 2018, 2018 NASA/ESA Conference on Adaptive Hardware and Systems (AHS).

[197]  Jean-François Nezan,et al.  A Distributed Framework for Low-Latency OpenVX over the RDMA NoC of a Clustered Manycore , 2018, 2018 IEEE High Performance extreme Computing Conference (HPEC).

[198]  Benoît Miramond,et al.  Confronting machine-learning with neuroscience for neuromorphic architectures design , 2018, 2018 International Joint Conference on Neural Networks (IJCNN).

[199]  Lieven Eeckhout,et al.  GDP: Using Dataflow Properties to Accurately Estimate Interference-Free Performance at Runtime , 2018, 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[200]  Micael S. Couceiro,et al.  Swarming in forestry environments: collective exploration and network deployment , 2018 .

[201]  Harald Michalik,et al.  Hardware Acceleration in Genode OS Using Dynamic Partial Reconfiguration , 2018, ARCS.

[202]  Xiangyu Zhang,et al.  ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[203]  Muhammad Ali,et al.  Low Power Image Processing Applications on FPGAs Using Dynamic Voltage Scaling and Partial Reconfiguration , 2018, 2018 Conference on Design and Architectures for Signal and Image Processing (DASIP).

[204]  Martin Margala,et al.  Exploration of Low Numeric Precision Deep Learning Inference Using Intel® FPGAs , 2018, FCCM.

[205]  Marcin Kowalczyk,et al.  Real-Time Implementation of Contextual Image Processing Operations for 4K Video Stream in Zynq UltraScale+ MPSoC , 2018, 2018 Conference on Design and Architectures for Signal and Image Processing (DASIP).

[206]  Alexander V. Veidenbaum,et al.  Acceleration Framework for FPGA Implementation of OpenVX Graph Pipelines , 2018, 2018 IEEE 26th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM).

[207]  Diana Göhringer,et al.  Full-HD Accelerated and Embedded Feature Detection Video System with 63fps using ORB for FREAK , 2018, 2018 International Conference on ReConFigurable Computing and FPGAs (ReConFig).

[208]  Andrea Bartolini,et al.  The D.A.V.I.D.E. big-data-powered fine-grain power and performance monitoring support , 2018, CF.

[209]  Hossein Omidian,et al.  An Accelerated OpenVX Overlay for Pure Software Programmers , 2018, 2018 International Conference on Field-Programmable Technology (FPT).

[210]  Patricia Balbastre Betoret,et al.  A Hypervisor Architecture for Low-Power Real-Time Embedded Systems , 2018, 2018 21st Euromicro Conference on Digital System Design (DSD).

[211]  Matthias Kollmann,et al.  Real-time on-board obstacle avoidance for UAVs based on embedded stereo vision , 2018, ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences.

[212]  Shi-Min Hu,et al.  Deep Online Video Stabilization , 2018, ArXiv.

[213]  Micael S. Couceiro,et al.  MoDSeM: Modular Framework for Distributed Semantic Mapping , 2019 .

[214]  Sparsh Mittal,et al.  A survey of techniques for optimizing deep learning on GPUs , 2019, J. Syst. Archit..

[215]  Diana Göhringer,et al.  HiFlipVX: An Open Source High-Level Synthesis FPGA Library for Image Processing , 2019, ARC.

[216]  Ricardo Tapiador-Morales,et al.  Neuromorphic LIF Row-by-Row Multiconvolution Processor for FPGA , 2019, IEEE Transactions on Biomedical Circuits and Systems.

[217]  Benoit Miramond,et al.  Information Coding and Hardware Architecture of Spiking Neural Networks , 2019, 2019 22nd Euromicro Conference on Digital System Design (DSD).

[218]  Phillip H. Jones,et al.  Comparing Energy Efficiency of CPU, GPU and FPGA Implementations for Vision Kernels , 2019, 2019 IEEE International Conference on Embedded Software and Systems (ICESS).

[219]  Micael S. Couceiro,et al.  SEMFIRE: Towards a new generation of forestry maintenance multi-robot systems , 2019, 2019 IEEE/SICE International Symposium on System Integration (SII).

[220]  Adewale Akinlawon Adetomi Dynamic reconfiguration frameworks for high-performance reliable real-time reconfigurable computing , 2019 .

[221]  Weizheng Wang,et al.  Development of convolutional neural network and its application in image classification: a survey , 2019, Optical Engineering.

[222]  Aleksandra Faust,et al.  Air Learning: An AI Research Platform for Algorithm-Hardware Benchmarking of Autonomous Aerial Robots , 2019, ArXiv.

[223]  Peter Y. K. Cheung,et al.  LUTNet: Rethinking Inference in FPGA Soft Logic , 2019, 2019 IEEE 27th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM).

[224]  Niall O' Mahony,et al.  Deep Learning vs. Traditional Computer Vision , 2019, CVC.

[225]  Micael S. Couceiro,et al.  MoDSeM: Towards Semantic Mapping with Distributed Robots , 2019, TAROS.

[226]  Kai Zhang,et al.  T-DLA: An Open-source Deep Learning Accelerator for Ternarized DNN Models on Embedded FPGA , 2019, 2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI).

[227]  Massimo Violante,et al.  RTOS Solution for NoC-Based COTS MPSoC Usage in Mixed-Criticality Systems , 2019, J. Electron. Test..

[228]  Yu Wang,et al.  [DL] A Survey of FPGA-based Neural Network Inference Accelerators , 2019, ACM Trans. Reconfigurable Technol. Syst..

[229]  Alexander V. Veidenbaum,et al.  AFFIX: Automatic Acceleration Framework for FPGA Implementation of OpenVX Vision Algorithms , 2019, FPGA.

[230]  Ken Sakurada,et al.  OpenVSLAM: A Versatile Visual SLAM Framework , 2019, ACM Multimedia.

[231]  Christos-Savvas Bouganis,et al.  fpgaConvNet: Mapping Regular and Irregular Convolutional Neural Networks on FPGAs , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[232]  Xiao Liu,et al.  DF-SLAM: A Deep-Learning Enhanced Visual SLAM System based on Deep Local Features , 2019, ArXiv.

[233]  Marco Platzner,et al.  An Approach for Mapping Periodic Real-Time Tasks to Reconfigurable Hardware , 2019, 2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).

[234]  Apala Guha,et al.  μIR -An intermediate representation for transforming and optimizing the microarchitecture of application accelerators , 2019, MICRO.

[235]  Sébastien Bilavarn,et al.  An FPGA-Based Hybrid Neural Network Accelerator for Embedded Satellite Image Classification , 2020, 2020 IEEE International Symposium on Circuits and Systems (ISCAS).

[236]  Diana Göhringer,et al.  Resource Efficient Dynamic Voltage and Frequency Scaling on Xilinx FPGAs , 2020, ARC.

[237]  Haibin Shen,et al.  An Efficient Hardware Accelerator for Structured Sparse Convolutional Neural Networks on FPGAs , 2020, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[238]  Qiang Shen,et al.  Real-Time Image Stabilization Method Based on Optical Flow and Binary Point Feature Matching , 2020 .

[239]  O. Fatemi,et al.  DCMI: A Scalable Strategy for Accelerating Iterative Stencil Loops on FPGAs , 2020, ACM Trans. Archit. Code Optim..