The CAMETRON Lecture Recording System: High Quality Video Recording and Editing with Minimal Human Supervision

In this paper, we demonstrate a system that automates the process of recording video lectures in classrooms. Through special hardware (lecturer and audience facing cameras and microphone arrays), we record multiple points of view of the lecture. Person detection and tracking, along with recognition of different human actions are used to digitally zoom in on the lecturer, and alternate focus between the lecturer and the slides or the blackboard. Audio sound source localization, along with face detection and tracking, is used to detect questions from the audience, to digitally zoom in on the member of the audience asking the question and to improve the quality of the sound recording. Finally, an automatic video editing system is used to naturally switch between the different video streams and to compose a compelling end product. We demonstrate the working system in two classrooms, over two 2-h lectures, given by two lecturers.

[1]  Andrew Zisserman,et al.  Convolutional Two-Stream Network Fusion for Video Action Recognition , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Tugba Orten Tugrul,et al.  Student Perceptions of an Educational Technology Tool: Video Recordings of Project Presentations , 2012 .

[3]  Daniel Povey,et al.  The Kaldi Speech Recognition Toolkit , 2011 .

[4]  Bernd Girod,et al.  An interactive region-of-interest video streaming system for online lecture viewing , 2010, 2010 18th International Packet Video Workshop.

[5]  Horst Bischof,et al.  A Duality Based Approach for Realtime TV-L1 Optical Flow , 2007, DAGM-Symposium.

[6]  Anoop Gupta,et al.  Automating lecture capture and broadcast: technology and videography , 2004, Multimedia Systems.

[7]  Ed Hahn Video lectures help enhance online information literacy course , 2012 .

[8]  Olaf A. Schulte,et al.  REPLAY: an integrated and open solution to produce, handle, and distributeaudio-visual (lecture) recordings , 2008, SIGUCCS '08.

[9]  X. Mestre,et al.  On diagonal loading for minimum variance beamformers , 2003, Proceedings of the 3rd IEEE International Symposium on Signal Processing and Information Technology (IEEE Cat. No.03EX795).

[10]  Emmanuel Vincent,et al.  Multi-source TDOA estimation in reverberant audio using angular spectra and clustering , 2012, Signal Process..

[11]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[12]  Simon P Albon,et al.  Student and Faculty Member Perspectives on Lecture Capture in Pharmacy Education , 2014, American Journal of Pharmaceutical Education.

[13]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[14]  Toon Goedemé,et al.  Fast and Accurate Face Orientation Measurement in Low-resolution Images on Embedded Hardware , 2016, VISIGRAPP.

[15]  Andrew Zisserman,et al.  Two-Stream Convolutional Networks for Action Recognition in Videos , 2014, NIPS.

[16]  Paul A. Viola,et al.  Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[17]  Gregory D. Abowd,et al.  Lessons learned from eClass: Assessing automated capture and access in the classroom , 2004, TCHI.

[18]  Zheng-Hua Tan,et al.  Low-Complexity Variable Frame Rate Analysis for Speech Recognition and Voice Activity Detection , 2010, IEEE Journal of Selected Topics in Signal Processing.

[19]  Yu Qiao,et al.  Joint Face Detection and Alignment Using Multitask Cascaded Convolutional Networks , 2016, IEEE Signal Processing Letters.

[20]  G. Carter,et al.  The generalized correlation method for estimation of time delay , 1976 .

[21]  David A. McAllester,et al.  A discriminatively trained, multiscale, deformable part model , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Toon Goedemé,et al.  A Probabilistic Logic Programming Approach to Automatic Video Montage , 2016, ECAI.

[23]  Wolfgang Effelsberg,et al.  An automatic cameraman in a lecture recording system , 2007, Emme '07.