Real-Time Object Detection and Recognition on Low-Compute Humanoid Robots using Deep Learning

We envision that in the near future, humanoid robots would share home space and assist us in our daily and routine activities through object manipulations. One of the fundamental technologies that needs to be developed for the robots is to enable them to detect objects and recognize them for effective manipulations and take real-time decisions involving the same. In this paper, we describe a novel architecture that enables multiple low-compute NAO robots to perform real-time detection, recognition and localization of objects in its camera view and take programmable actions based on the detected objects. The proposed algorithm for object detection and localization is an empirical modification of YOLOv3 along with a distributed architecture to operate multiple robots on a central “inference engine”, based on indoor experiments in multiple scenarios, with a smaller weight size and lesser computational requirements. YOLOv3 was chosen after a comparative study of bounding box algorithms was performed with an objective to choose one that strikes the perfect balance among information retention, low inference time and high accuracy for real-time object detection and localization. Quantization of the weights and re-adjusting filter sizes and layer arrangements for convolutions improved the inference time for low-resolution images from the robot's camera feed. The architecture also comprises of an effective end-to-end pipeline to feed the real-time frames from the camera feed to the neural net and use its results for guiding the robot with customizable actions corresponding to the detected class labels.

[1]  Nicolás Cruz,et al.  Using Convolutional Neural Networks in Robots with Limited Computational Resources: Detecting NAO Robots while Playing Soccer , 2017, RoboCup.

[2]  Yap June Wai,et al.  Fixed Point Implementation of Tiny-Yolo-v2 using OpenCL on FPGA , 2018 .

[3]  Ross B. Girshick,et al.  Mask R-CNN , 2017, 1703.06870.

[4]  Ali Farhadi,et al.  YOLOv3: An Incremental Improvement , 2018, ArXiv.

[5]  Dumitru Erhan,et al.  Scalable Object Detection Using Deep Neural Networks , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Yi Li,et al.  R-FCN: Object Detection via Region-based Fully Convolutional Networks , 2016, NIPS.

[8]  Zhiyong Gao,et al.  Hardware Implementation and Optimization of Tiny-YOLO Network , 2017, IFTC.

[9]  G. C. Nandi,et al.  Intent-based Object Grasping by a Robot using Deep Learning , 2018, 2018 IEEE 8th International Advance Computing Conference (IACC).

[10]  Elena Márquez Segura,et al.  The NAO models for the elderly , 2013, 2013 8th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[11]  S. Shamsuddin,et al.  Initial response of autistic children in human-robot interaction therapy with humanoid robot NAO , 2012, 2012 IEEE 8th International Colloquium on Signal Processing and its Applications.

[12]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[13]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[14]  Nuno M. Fonseca Ferreira,et al.  Fostering the NAO platform as an elderly care robot , 2013, 2013 IEEE 2nd International Conference on Serious Games and Applications for Health (SeGAH).

[15]  Ali Farhadi,et al.  YOLO9000: Better, Faster, Stronger , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Malik Mallem,et al.  Real-time tele-operation and tele-walking of humanoid Robot Nao using Kinect Depth Camera , 2013, 2013 10th IEEE INTERNATIONAL CONFERENCE ON NETWORKING, SENSING AND CONTROL (ICNSC).

[18]  Nick Campbell,et al.  Investigating the use of Non-verbal Cues in Human-Robot Interaction with a Nao robot , 2012, 2012 IEEE 3rd International Conference on Cognitive Infocommunications (CogInfoCom).

[19]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[20]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.