A Novel Deep Neural Network that Uses Space-Time Features for Tracking and Recognizing a Moving Object

Abstract This work proposes a deep neural net (DNN) that accomplishes the reliable visual recognition of a chosen object captured with a webcam and moving in a 3D space. Autoencoding and substitutional reality are used to train a shallow net until it achieves zero tracking error in a discrete ambient. This trained individual is set to work in a real world closed loop system where images coming from a webcam produce displacement information for a moving region of interest (ROI) inside the own image. This loop gives rise to an emergent tracking behavior which creates a self-maintain flow of compressed space-time data. Next, short term memory elements are set to play a key role by creating new representations in terms of a space-time matrix. The obtained representations are delivery as input to a second shallow network which acts as “recognizer”. A noise balanced learning method is used to fast train the recognizer with real-world images, giving rise to a simple and yet powerful robotic eye, with a slender neural processor that vigorously tracks and recognizes the chosen object. The system has been tested with real images in real time.

[1]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[2]  Naotaka Fujii,et al.  Substitutional Reality System: A Novel Experimental Platform for Experiencing Alternative Reality , 2012, Scientific Reports.

[3]  Steven J. Luck,et al.  Visual short term memory , 2007, Scholarpedia.

[4]  Yuting Zhang,et al.  Improving object detection with deep convolutional networks via Bayesian optimization and structured prediction , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Mohammad H. Mahoor,et al.  Vision-Based Landing of Light Weight Unmanned Helicopters on a Smart Landing Platform , 2011, J. Intell. Robotic Syst..

[6]  E. Averbach,et al.  Short-term memory in vision , 1961 .

[7]  Oscar Chang,et al.  Reliable object recognition by using cooperative neural agents , 2014, 2014 International Joint Conference on Neural Networks (IJCNN).

[8]  Dumitru Erhan,et al.  Deep Neural Networks for Object Detection , 2013, NIPS.

[9]  Dorian Aur,et al.  Can we build a conscious machine? , 2014, ArXiv.

[10]  Karen Drukker,et al.  A study of the effect of noise injection on the training of artificial neural networks , 2009, 2009 International Joint Conference on Neural Networks.

[11]  Miguel A. Olivares-Méndez,et al.  3D pose estimation based on planar object tracking for UAVs control , 2010, 2010 IEEE International Conference on Robotics and Automation.

[12]  Oscar Chang A Bio-Inspired Robot with Visual Perception of Affordances , 2014, ECCV Workshops.

[13]  Yann LeCun,et al.  What is the best multi-stage architecture for object recognition? , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[14]  Lorenzo L. Pesce,et al.  Noise injection for training artificial neural networks: a comparison with weight decay and early stopping. , 2009, Medical physics.

[15]  Nicholas R. Jennings,et al.  Intelligent agents: theory and practice , 1995, The Knowledge Engineering Review.