Building a smart lecture-recording system using MK-CPN network for heterogeneous data sources

Nowadays, lecture-recording systems play a vital role in collecting spoken discourse for e-learning. However, in view of the growing development of e-learning, the lack of content is becoming a problem. This research presents a smart lecture-recording (SLR) system that can record orations at the same level of quality as a human team, but with a reduced degree of human involvement. The proposed SLR system is composed of two subsystems, referred to as virtual cameraman (VC), and virtual director (VD), respectively. All camera man components of VC subsystem are automatic and can take actions that include target and event detection, tracking, and view searching. The videos taken by these three components are forwarded to the VD subsystem, in which the representative shot is chosen for recording or direct broadcasting. We refer to this function of the VD subsystem as shot selection that is based on the content analysis. The capability of shot selection is pre-trained through a machine-learning process characterized by the counter-propagation neural (CPN) network. However, the CPN network yielded poor results when the input data were heterogeneous data. To increases the accuracy of shot selection, we applied multiple kernel learning (MKL) techniques into CPN network, called MK-CPN, to transform all the heterogeneous data from different content analysis methods into unified space. A series of experiments for real lecture has been conducted. The results showed that the proposed SLR system can provide oration records close to some extend to those taken by real human teams. We believe that the proposed system may not be limited to live speeches, if it can be configured with appropriate training materials.

[1]  Shin Gyu Kim,et al.  Ubiquitous City Technology & Applications , 2007 .

[2]  Michael Bianchi Automatic video production of lectures using an intelligent and aware environment , 2004, MUM '04.

[3]  Diane Harley,et al.  BIBS: A Lecture Webcasting System , 2001 .

[4]  Lawrence A. Rowe,et al.  Virtual director: automating a webcast , 2001, IS&T/SPIE Electronic Imaging.

[5]  Yasuo Ariki,et al.  Video editing support system based on video grammar and content analysis , 2002, Object recognition supported by user interaction for service robots.

[6]  T. Shinogi,et al.  Video Scene Segmentation Using the State Recognition of Blackboard for Blended Learning , 2007, 2007 International Conference on Convergence Information Technology (ICCIT 2007).

[7]  Tinghuai Wang,et al.  An Evolutionary Approach to Automatic Video Editing , 2009, 2009 Conference for Visual Media Production.

[8]  R. Hill,et al.  Capturing and playing multimedia events with STREAMS , 1994, MULTIMEDIA '94.

[9]  Sei-Wang Chen,et al.  Automatic change detection of driving environments in a vision-based driver assistance system , 2003, IEEE Trans. Neural Networks.

[10]  Christophe Corbier,et al.  Balanced simplicity–accuracy neural network model families for system identification , 2014, Neural Computing and Applications.

[11]  Anoop Gupta,et al.  Building an intelligent camera management system , 2001, MULTIMEDIA '01.

[12]  J. Pfanzagl Parametric Statistical Theory , 1994 .

[13]  Anoop Gupta,et al.  Automating lecture capture and broadcast: technology and videography , 2004, Multimedia Systems.

[14]  Ethem Alpaydin,et al.  Multiple Kernel Learning Algorithms , 2011, J. Mach. Learn. Res..

[15]  Takeo Kanade,et al.  An Iterative Image Registration Technique with an Application to Stereo Vision , 1981, IJCAI.

[16]  Anoop Gupta,et al.  Automating camera management for lecture room environments , 2001, CHI.

[17]  Nanning Zheng,et al.  Learning to Detect a Salient Object , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Chiou-Shann Fuh,et al.  Multiple Kernel Learning for Dimensionality Reduction , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Gregory D. Abowd,et al.  Classroom 2000: An Experiment with the Instrumentation of a Living Educational Environment , 1999, IBM Syst. J..

[20]  Lawrence D. Jackel,et al.  Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[21]  Michael Gleicher,et al.  Towards virtual videography (poster session) , 2000, ACM Multimedia.

[22]  Yizong Cheng,et al.  Mean Shift, Mode Seeking, and Clustering , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[23]  K. Fukunaga,et al.  Shooting the lecture scene using computer-controlled cameras based on situation understanding and evaluation of video images , 2004, ICPR 2004.

[24]  Michael Gleicher,et al.  Towards Virtual Videography , 2000 .

[25]  Tom Drummond,et al.  Fusing points and lines for high performance tracking , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[26]  Zygmunt Pizlo,et al.  Camera Motion-Based Analysis of User Generated Video , 2010, IEEE Transactions on Multimedia.

[27]  Danwei Wang,et al.  A robust recurrent simultaneous perturbation stochastic approximation training algorithm for recurrent neural networks , 2013, Neural Computing and Applications.

[28]  Tommy W. S. Chow,et al.  Object-Level Video Advertising: An Optimization Framework , 2017, IEEE Transactions on Industrial Informatics.