Using audio time scale modification for video browsing

In the IBM CueVideo project we study various aspects of fully automated video indexing, browsing and retrieval. The technical aspects include audio processing, speech recognition, image processing and information retrieval. Equally important, however, is exploring user expectations and conducting user studies. We focus on the field of video for Training and Education, including Distributed Learning, Remote Education, and Just-in-Time Learning. This paper describes the use of audio processing technology, namely audio Time Scale Modification (TSM), for the novel application of fast video browsing and efficient video-based learning. The paper provides a brief overview of the CueVideo system, technical background of TSM technology, and the way it is being used in our system. The results of our usability study on the effect of TSM on speech comprehension indicate that TSM is very useful for fast video browsing.

[1]  Dragutin Petkovic,et al.  Query by Image and Video Content: The QBIC System , 1995, Computer.

[2]  Andreas Girgensohn,et al.  An intelligent media browser using automatic multimodal analysis , 1998, MULTIMEDIA '98.

[3]  J. Nunamaker,et al.  Proceedings of the 32nd Hawaii International Conference on System Sciences , 1999 .

[4]  Remi Depommier,et al.  Content-based browsing of video sequences , 1994, MULTIMEDIA '94.

[5]  Alexander G. Hauptmann,et al.  Speech recognition in the Informedia Digital Video Library: uses and limitations , 1995, Proceedings of 7th IEEE International Conference on Tools with Artificial Intelligence.

[6]  Dragutin Petkovic,et al.  Content-Based Representation and Retrieval of Visual Media: A State-of-the-Art Review , 1996 .

[7]  Shingo Uchihashi,et al.  Summarizing video using a shot importance measure and a frame-packing algorithm , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[8]  Neill W Campbell,et al.  ACM Multimedia 98 , 1998 .

[9]  Peder A. Olsen,et al.  Transcription of broadcast news-some recent improvements to IBM's LVCSR system , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[10]  Amarnath Gupta,et al.  Virage image search engine: an open framework for image management , 1996, Electronic Imaging.

[11]  Michael Picheny,et al.  Performance of the IBM large vocabulary continuous speech recognition system on the ARPA Wall Street Journal task , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[12]  Werner Verhelst,et al.  An overlap-add technique based on waveform similarity (WSOLA) for high quality time-scale modification of speech , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[13]  Shih-Fu Chang,et al.  Next-generation content representation, creation, and searching for new-media applications in education , 1998 .

[14]  Shih-Fu Chang,et al.  VideoQ: an automated content based video search system using visual cues , 1997, MULTIMEDIA '97.

[15]  Behzad Shahraray,et al.  On the applications of multimedia processing to communications , 1998, Proc. IEEE.

[16]  A. Wilgus,et al.  High quality time-scale modification for speech , 1985, ICASSP '85. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[17]  Anoop Gupta,et al.  Corporate Deployment of On-Demand Video: Usage, Benefits, and Lessons , 1998 .

[18]  Yihong Gong,et al.  Lessons Learned from Building a Terabyte Digital Video Library , 1999, Computer.

[19]  Ahmed K. Elmagarmid,et al.  Video Database Systems , 1997, Advances in Database Systems.

[20]  David Malah,et al.  Time-domain algorithms for harmonic bandwidth reduction and time scaling of speech signals , 1979 .

[21]  M. Smith,et al.  Video Skimming for Quick Browsing based on Audio and Image Characterization , 1995 .

[22]  David Bargeron,et al.  Annotations for Streaming Video on the Web: System Design and Usage Studies , 1999, Comput. Networks.

[23]  Boon-Lock Yeo,et al.  Video query: Research directions , 1998, IBM J. Res. Dev..

[24]  Eric Moulines,et al.  Voice transformation using PSOLA technique , 1991, Speech Commun..