Surveillance video online prediction using multilayer ELM with object principal trajectory

For online prediction of surveillance video, how to design a valid machine learning model is a challenging problem. To deal with the issue, a multilayer ELM with object principal trajectory has been proposed. In this scheme, in order to support dynamic semantic representation between adjacent frames, the temporal and spatial characteristics have been taken into consideration. And after calculated the coordinate distance by K-means algorithm, the objective regions can be separated at the pixel level. Then, the object moving trend is determined according to the principal trajectory of interest area. Finally, multilayer ELM is adopted to quantify the new shape characteristics. This deep neural network helps generate the new frame sequence enough to be true. The proposed method not only recognizes multiple objects with different movement directions, but also effectively identifies subtle semantic features. The whole forecasting process avoids the trial and error caused by user intervention, which makes the model suitable for online environment. Numerical experiments are conducted on two different kinds of surveillance video datasets. The result is shown that the proposed algorithm has better performance than other state-of-the-art methods.

[1]  Ying Yin,et al.  Enhancing ELM by Markov Boundary based feature selection , 2017, Neurocomputing.

[2]  Antonio Jesús Díaz-Honrubia,et al.  A fast intra H.264/AVC to HEVC transcoding system , 2017, Multimedia Tools and Applications.

[3]  Rongguo Zhang,et al.  A fast method for moving object detection in video surveillance image , 2017, Signal Image Video Process..

[4]  Anand Singh Jalal,et al.  Suspicious human activity recognition: a review , 2017, Artificial Intelligence Review.

[5]  Ming Zhu,et al.  Obstacle detection in single images with deep neural networks , 2016, Signal Image Video Process..

[6]  Shahram Shirani,et al.  Frame Rate Upconversion Using Optical Flow and Patch-Based Reconstruction , 2016, IEEE Transactions on Circuits and Systems for Video Technology.

[7]  Sangyoun Lee,et al.  SSIM-based distortion metric for film grain noise in HEVC , 2018, Signal Image Video Process..

[8]  Gang Wang,et al.  Skeleton-Based Human Action Recognition With Global Context-Aware Attention LSTM Networks , 2017, IEEE Transactions on Image Processing.

[9]  Nanning Zheng,et al.  Video object segmentation with shape cue based on spatiotemporal superpixel neighbourhood , 2014, IET Comput. Vis..

[10]  Shuicheng Yan,et al.  Robust LSTM-Autoencoders for Face De-Occlusion in the Wild , 2016, IEEE Transactions on Image Processing.

[11]  Maria Trocan,et al.  Deep neural network based single pixel prediction for unified video coding , 2018, Neurocomputing.

[12]  Xianguo Zhang,et al.  Optimizing the Hierarchical Prediction and Coding in HEVC for Surveillance and Conference Videos With Background Modeling , 2014, IEEE Transactions on Image Processing.

[13]  Suleyman Serdar Kozat,et al.  Nonuniformly Sampled Data Processing Using LSTM Networks , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[14]  Marc Teboulle,et al.  A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..

[15]  Malay Kumar Kundu,et al.  Efficient Foreground Extraction From HEVC Compressed Video for Application to Real-Time Analysis of Surveillance ‘Big’ Data , 2015, IEEE Transactions on Image Processing.

[16]  G. Sreelekha,et al.  Performance enhancement of HEVC lossless mode using sample-based angular and planar predictions , 2017, Signal Image Video Process..

[17]  Jürgen Schmidhuber,et al.  LSTM: A Search Space Odyssey , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[18]  Fei Su,et al.  Specific video identification via joint learning of latent semantic concept, scene and temporal structure , 2016, Neurocomputing.

[19]  Nitish Srivastava,et al.  Unsupervised Learning of Video Representations using LSTMs , 2015, ICML.

[20]  Justin Romberg,et al.  Compressive Deconvolution in Random Mask Imaging , 2014, IEEE Transactions on Computational Imaging.

[21]  Rishi Richa,et al.  Ohmic Heating Technology and Its Application in Meaty Food: A Review , 2017 .

[22]  Thomas S. Huang,et al.  Image sequence analysis , 1981 .

[23]  Jorma Laaksonen,et al.  Exploiting inter-image similarity and ensemble of extreme learners for fixation prediction using deep features , 2016, Neurocomputing.

[24]  Driss Aboutajdine,et al.  A phase-based framework for optical flow estimation on omnidirectional images , 2016, Signal Image Video Process..

[25]  Aboul Ella Hassanien,et al.  Optimized superpixel and AdaBoost classifier for human thermal face recognition , 2018, Signal Image Video Process..

[26]  Hsien-Chung Wu,et al.  The Karush-Kuhn-Tucker optimality conditions in an optimization problem with interval-valued objective function , 2007, Eur. J. Oper. Res..

[27]  Xiangzhong Fang,et al.  Capturing Temporal Structures for Video Captioning by Spatio-temporal Contexts and Channel Attention Mechanism , 2017, Neural Processing Letters.

[28]  Guang-Bin Huang,et al.  Extreme Learning Machine for Multilayer Perceptron , 2016, IEEE Transactions on Neural Networks and Learning Systems.

[29]  Yu Wang,et al.  Choosing Between Two Classification Learning Algorithms Based on Calibrated Balanced $$5\times 2$$5×2 Cross-Validated F-Test , 2016, Neural Processing Letters.

[30]  Chee Kheong Siew,et al.  Extreme learning machine: Theory and applications , 2006, Neurocomputing.