Automating camera management for lecture room environments

Given rapid improvements in network infrastructure and streaming-media technologies, a large number of corporations and universities are recording lectures and making them available online for anytime, anywhere access. However, producing high-quality lecture videos is still labor intensive and expensive. Fortunately, recent technology advances are making it feasible to build automated camera management systems to capture lectures. In this paper we report on our design, implementation and study of such a system. Compared to previous work-which has tended to be technology centric-we started with interviews with professional video producers and used their knowledge and expertise to create video production rules. We then targeted technology components that allowed us to implement a substantial portion of these rules, including the design of a virtual video director. The system's performance was compared to that of a human operator via a user study. Results suggest that our system's quality in close to that of a human-controlled system. In fact most remote audience members could not tell if the video was produced by a computer or a person.

[1]  D. Arijon,et al.  Grammar of Film Language , 1976 .

[2]  R. Hill,et al.  Capturing and playing multimedia events with STREAMS , 1994, MULTIMEDIA '94.

[3]  David C. Hogg,et al.  An efficient method for contour tracking using active shape models , 1994, Proceedings of 1994 IEEE Workshop on Motion of Non-rigid and Articulated Objects.

[4]  David Salesin,et al.  The virtual cinematographer: a paradigm for automatic real-time camera control and directing , 1996, SIGGRAPH.

[5]  M. S. Brandstein A pitch-based approach to time-delay estimation of reverberant speech , 1997, Proceedings of 1997 Workshop on Applications of Signal Processing to Audio and Acoustics.

[6]  Michael S. Brandstein,et al.  A hybrid real-time face tracking system , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[7]  Matthew Turk,et al.  View-based interpretation of real-time optical flow for gesture recognition , 1998, Proceedings Third IEEE International Conference on Automatic Face and Gesture Recognition.

[8]  Shumin Zhai,et al.  Manual and gaze input cascaded (MAGIC) pointing , 1999, CHI '99.

[9]  Alexander H. Waibel,et al.  Modeling focus of attention for meeting indexing , 1999, MULTIMEDIA '99.

[10]  Brian Christopher Smith,et al.  Passive capture and structuring of lectures , 1999, MULTIMEDIA '99.

[11]  Wenyu Jiang,et al.  Adaptive speech noise reduction , 1999 .

[12]  David Bargeron,et al.  Annotations for streaming video on the web , 1999, CHI Extended Abstracts.

[13]  Anoop Gupta,et al.  Time-compression: systems concerns, usage, and benefits , 1999, CHI '99.

[14]  David Bargeron,et al.  Annotations for Streaming Video on the Web: System Design and Usage Studies , 1999, Comput. Networks.

[15]  Anoop Gupta,et al.  Presenting to local and remote audiences: design and use of the TELEP system , 2000, CHI.

[16]  Anoop Gupta,et al.  Designing presentations for on-demand viewing , 2000, CSCW '00.

[17]  Benesty,et al.  Adaptive eigenvalue decomposition algorithm for passive acoustic source localization , 2000, The Journal of the Acoustical Society of America.

[18]  Gregory D. Abowd,et al.  Rooms Take Note: Room Takes Notes! , 2002 .