Automating lecture capture and broadcast: technology and videography

Abstract.Our goal is to help automate the capture and broadcast of lectures to online audiences. Such systems have two interrelated design components. The technology component includes hardware and associated software. The aesthetic component comprises the rules and idioms that human videographers follow to make a video visually engaging; these rules guide hardware placement and software algorithms. We report the design of a complete system that captures and broadcasts lectures automatically and report on a user study and a detailed set of video-production rules obtained from professional videographers who critiqued the system, which has been deployed in our organization for 2 years. We describe how the system can be generalized to a variety of lecture room environments differing in room size and number of cameras. We also discuss gaps between what professional videographers do and what is technologically feasible today.

[1]  R. Hill,et al.  Capturing and playing multimedia events with STREAMS , 1994, MULTIMEDIA '94.

[2]  Don Kimber,et al.  FlyCam: practical panoramic video , 2000, ACM Multimedia.

[3]  Anoop Gupta,et al.  Building an intelligent camera management system , 2001, MULTIMEDIA '01.

[4]  Anoop Gupta,et al.  Designing presentations for on-demand viewing , 2000, CSCW '00.

[5]  Michael Gleicher,et al.  Towards virtual videography (poster session) , 2000, ACM Multimedia.

[6]  Michael S. Brandstein,et al.  A hybrid real-time face tracking system , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[7]  Michael S. Brandstein,et al.  A practical methodology for speech source localization with microphone arrays , 1997, Comput. Speech Lang..

[8]  Benesty Adaptive eigenvalue decomposition algorithm for passive acoustic source localization , 2000, The Journal of the Acoustical Society of America.

[9]  Matthew Turk,et al.  View-based interpretation of real-time optical flow for gesture recognition , 1998, Proceedings Third IEEE International Conference on Automatic Face and Gesture Recognition.

[10]  Brian Christopher Smith,et al.  Passive capture and structuring of lectures , 1999, MULTIMEDIA '99.

[11]  David C. Hogg,et al.  An efficient method for contour tracking using active shape models , 1994, Proceedings of 1994 IEEE Workshop on Motion of Non-rigid and Articulated Objects.

[12]  Anoop Gupta,et al.  Automating camera management for lecture room environments , 2001, CHI.

[13]  D. Arijon,et al.  Grammar of Film Language , 1976 .

[14]  David Salesin,et al.  The virtual cinematographer: a paradigm for automatic real-time camera control and directing , 1996, SIGGRAPH.

[15]  Michael Gleicher,et al.  Towards Virtual Videography , 2000 .

[16]  Gregory D. Abowd,et al.  Rooms Take Note: Room Takes Notes! , 2002 .

[17]  Zhengyou Zhang,et al.  A Flexible New Technique for Camera Calibration , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[18]  Abigail Sellen,et al.  Video-Mediated Communication , 1997 .

[19]  Dezhen Song,et al.  ShareCam part 1: interface, system architecture, and implementation of a collaboratively controlled robotic Webcam , 2003, Proceedings 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2003) (Cat. No.03CH37453).

[20]  Yong Rui,et al.  New direct approaches to robust sound source localization , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[21]  Hong Wang,et al.  Voice source localization for automatic camera pointing system in videoconferencing , 1997, Proceedings of 1997 Workshop on Applications of Signal Processing to Audio and Acoustics.

[22]  John S. Boreczky,et al.  FlySPEC: a multi-user video camera system with hybrid human and automatic control , 2002, MULTIMEDIA '02.

[23]  Shumin Zhai,et al.  Manual and gaze input cascaded (MAGIC) pointing , 1999, CHI '99.

[24]  Dezhen Song,et al.  ShareCam part II: approximate and distributed algorithms for a collaboratively controlled robotic Webcam , 2003, Proceedings 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2003) (Cat. No.03CH37453).

[25]  Yong Rui,et al.  Time delay estimation in the presence of correlated noise and reverberation , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[26]  Alexander H. Waibel,et al.  Modeling focus of attention for meeting indexing , 1999, MULTIMEDIA '99.

[27]  Anoop Gupta,et al.  Videography for telepresentations , 2003, CHI '03.