Detecting Semantic Concepts from Video Using Temporal Gradients and Audio Classification

In this paper we describe new methods to detect semantic concepts from digital video based on audible and visual content. Temporal Gradient Correlogram captures temporal correlations of gradient edge directions from sampled shot frames. Power-related physical features are extracted from short audio samples in video shots. Video shots containing people, cityscape, landscape, speech or instrumental sound are detected with trained self-organized maps and kNN classification results of audio samples. Test runs and evaluations in TREC 2002 Video Track show consistent performance for Temporal Gradient Correlogram and state-of-the-art precision in audio-based instrumental sound detection.

[1]  B. S. Manjunath,et al.  Introduction to mpeg-7 , 2002 .

[2]  Marcel Worring,et al.  Content-Based Image Retrieval at the End of the Early Years , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  Ellen M. Voorhees,et al.  The Tenth Text REtrieval Conference, TREC 2001 | NIST , 2002 .

[4]  Dragutin Petkovic,et al.  Query by Image and Video Content: The QBIC System , 1995, Computer.

[5]  Naoya Yokoyama,et al.  Neural Coding of Color , 2004 .

[6]  John Saunders,et al.  Real-time discrimination of broadcast speech/music , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[7]  Anil K. Jain,et al.  On image classification: city vs. landscape , 1998, Proceedings. IEEE Workshop on Content-Based Access of Image and Video Libraries (Cat. No.98EX173).

[8]  Malcolm Slaney,et al.  Construction and evaluation of a robust multifeature speech/music discriminator , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[9]  Brendan J. Frey,et al.  Probabilistic multimedia objects (multijects): a novel approach to video indexing and retrieval in multimedia systems , 1998, Proceedings 1998 International Conference on Image Processing. ICIP98 (Cat. No.98CB36269).

[10]  Gary Marchionini,et al.  Open video: A framework for a test collection , 2000, J. Netw. Comput. Appl..

[11]  M. P. Friedman,et al.  HANDBOOK OF PERCEPTION , 1977 .

[12]  Mika Rautiainen,et al.  Temporal color correlograms for video retrieval , 2002, Object recognition supported by user interaction for service robots.

[13]  John R. Smith,et al.  A statistical modeling approach to content based video retrieval , 2002, Object recognition supported by user interaction for service robots.

[14]  Harry Wechsler,et al.  Detection of human speech in structured noise , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[15]  Timo Ojala,et al.  TREC 2002 Video Track Experiments at MediaTeam Oulu and VTT , 2002, TREC.

[16]  Alberto Del Bimbo,et al.  Expressive Semantics for Automatic Annotation and Retrieval of Video Streams , 2000, IEEE International Conference on Multimedia and Expo.

[17]  Jani Penttilä,et al.  A SPEECH/MUSIC DISCRIMINATOR -BASED AUDIO BROWSER WITH A DEGREE OF CERTAINTY MEASURE , 2001 .

[18]  Robert M. Haralick,et al.  Textural Features for Image Classification , 1973, IEEE Trans. Syst. Man Cybern..

[19]  Matti Pietikäinen,et al.  Separating color and pattern information for color texture discrimination , 2002, Object recognition supported by user interaction for service robots.

[20]  Ellen M. Voorhees,et al.  Overview of TREC 2001 , 2001, TREC.

[21]  B. S. Manjunath,et al.  Introduction to MPEG-7: Multimedia Content Description Interface , 2002 .

[22]  Azriel Rosenfeld,et al.  Picture Processing and Psychopictorics , 1970 .

[23]  Michael J. Carey,et al.  A comparison of features for speech, music discrimination , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[24]  Jing Huang,et al.  Image indexing using color correlograms , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[25]  Milind R. Naphade,et al.  Semantic video indexing using a probabilistic framework , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.