Instructor Activity Recognition through Deep Spatiotemporal Features and Feedforward Extreme Learning Machines

Human action recognition has the potential to predict the activities of an instructor within the lecture room. Evaluation of lecture delivery can help teachers analyze shortcomings and plan lectures more effectively. However, manual or peer evaluation is time-consuming, tedious and sometimes it is difficult to remember all the details of the lecture. Therefore, automation of lecture delivery evaluation significantly improves teaching style. In this paper, we propose a feedforward learning model for instructor’s activity recognition in the lecture room. The proposed scheme represents a video sequence in the form of a single frame to capture the motion profile of the instructor by observing the spatiotemporal relation within the video frames. First, we segment the instructor silhouettes from input videos using graph-cut segmentation and generate a motion profile. These motion profiles are centered by obtaining the largest connected components and normalized. Then, these motion profiles are represented in the form of feature maps by a deep convolutional neural network. Then, an extreme learning machine (ELM) classifier is trained over the obtained feature representations to recognize eight different activities of the instructor within the classroom. For the evaluation of the proposed method, we created an instructor activity video (IAVID-1) dataset and compared our method against different state-of-the-art activity recognition methods. Furthermore, two standard datasets, MuHAVI and IXMAS, were also considered for the evaluation of the proposed scheme.

[1]  Ming Yang,et al.  3D Convolutional Neural Networks for Human Action Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Andrew Zisserman,et al.  Two-Stream Convolutional Networks for Action Recognition in Videos , 2014, NIPS.

[3]  Yu-Chiang Frank Wang,et al.  Recognizing Actions across Cameras by Exploring the Correlated Subspace , 2012, ECCV Workshops.

[4]  Matej Kristan,et al.  Histograms of optical flow for efficient representation of body motion , 2010, Pattern Recognit. Lett..

[5]  Mubarak Shah,et al.  Incremental action recognition using feature-tree , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[6]  Muhammad Haroon Yousaf,et al.  A Bag of Expression framework for improved human action recognition , 2018, Pattern Recognit. Lett..

[7]  Cordelia Schmid,et al.  A Spatio-Temporal Descriptor Based on 3D-Gradients , 2008, BMVC.

[8]  Serge J. Belongie,et al.  Behavior recognition via sparse spatio-temporal features , 2005, 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance.

[9]  Md. Atiqur Rahman Ahad,et al.  Action recognition based on binary patterns of action-history and histogram of oriented gradient , 2016, Journal on Multimodal User Interfaces.

[10]  Alexandros Iosifidis,et al.  Minimum Variance Extreme Learning Machine for human action recognition , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[11]  Cordelia Schmid,et al.  Learning realistic human actions from movies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Mubarak Shah,et al.  UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild , 2012, ArXiv.

[13]  Miriam Gamoran Sherin,et al.  Teacher self-captured video , 2017 .

[14]  Alexandros Iosifidis,et al.  Minimum Class Variance Extreme Learning Machine for Human Action Recognition , 2013, IEEE Transactions on Circuits and Systems for Video Technology.

[15]  Mubarak Shah,et al.  Learning 4D action feature models for arbitrary view action recognition , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Avinash C. Kak,et al.  Distributed and lightweight multi-camera human activity classification , 2009, 2009 Third ACM/IEEE International Conference on Distributed Smart Cameras (ICDSC).

[17]  Ling Shao,et al.  From handcrafted to learned representations for human action recognition: A survey , 2016, Image Vis. Comput..

[18]  Mubarak Shah,et al.  A 3-dimensional sift descriptor and its application to action recognition , 2007, ACM Multimedia.

[19]  Hongming Zhou,et al.  Extreme Learning Machine for Regression and Multiclass Classification , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[20]  Ali Farhadi,et al.  Learning to Recognize Activities from the Wrong View Point , 2008, ECCV.

[21]  Christian Bauckhage,et al.  Action recognition by learning discriminative key poses , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[22]  Alexandros André Chaaraoui,et al.  Silhouette-based human action recognition using sequences of key poses , 2013, Pattern Recognit. Lett..

[23]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[24]  Muhammad Haroon Yousaf,et al.  A novel vision based Approach for instructor's performance and behavior analysis , 2015, 2015 International Conference on Communications, Signal Processing, and their Applications (ICCSPA'15).

[25]  Qinghua Zheng,et al.  Regularized Extreme Learning Machine , 2009, 2009 IEEE Symposium on Computational Intelligence and Data Mining.

[26]  Cordelia Schmid,et al.  Action Recognition with Improved Trajectories , 2013, 2013 IEEE International Conference on Computer Vision.

[27]  Luc Van Gool,et al.  An Efficient Dense and Scale-Invariant Spatio-Temporal Interest Point Detector , 2008, ECCV.

[28]  Rémi Ronfard,et al.  Action Recognition from Arbitrary Views using 3D Exemplars , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[29]  Muhammad Haroon Yousaf,et al.  Multi-view human action recognition using 2D motion templates based on MHIs and their HOG description , 2016, IET Comput. Vis..

[30]  Limin Wang,et al.  Action recognition with trajectory-pooled deep-convolutional descriptors , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Muhammad Haroon Yousaf,et al.  HMM-based Scheme for Smart Instructor Activity Recognition in a Lecture Room Environment , 2015, Smart Comput. Rev..

[32]  Cordelia Schmid,et al.  Dense Trajectories and Motion Boundary Descriptors for Action Recognition , 2013, International Journal of Computer Vision.

[33]  Jing Xiao,et al.  Substructure and boundary modeling for continuous action recognition , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[34]  Thomas Serre,et al.  HMDB: A large video database for human motion recognition , 2011, 2011 International Conference on Computer Vision.

[35]  Guang-Bin Huang,et al.  Extreme learning machine: a new learning scheme of feedforward neural networks , 2004, 2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No.04CH37541).

[36]  Zhiping Lin,et al.  Self-Adaptive Evolutionary Extreme Learning Machine , 2012, Neural Processing Letters.

[37]  Sarah A. Nagro,et al.  The Effects of Guided Video Analysis on Teacher Candidates’ Reflective Ability and Instructional Skills , 2017 .

[38]  Han Wang,et al.  Ensemble Based Extreme Learning Machine , 2010, IEEE Signal Processing Letters.

[39]  Sergio A. Velastin,et al.  Automatic Segmentation and Recognition of Human Actions in Monocular Sequences , 2014, 2014 22nd International Conference on Pattern Recognition.

[40]  Bernard Ghanem,et al.  ActivityNet: A large-scale video benchmark for human activity understanding , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Alexandros Iosifidis,et al.  View-Invariant Action Recognition Based on Artificial Neural Networks , 2012, IEEE Transactions on Neural Networks and Learning Systems.