Video Segmentation and Shot Boundary Detection Using Self-Organizing Maps

We present a video shot boundary detection (SBD) algorithm that spots discontinuities in visual stream by monitoring video frame trajectories on Self-Organizing Maps (SOMs). The SOM mapping compensates for the probability density differences in the feature space, and consequently distances between SOM coordinates are more informative than distances between plain feature vectors. The proposed method compares two sliding best-matching unit windows instead of just measuring distances between two trajectory points, which increases the robustness of the detector. This can be seen as a variant of the adaptive threshold SBD methods. Furthermore, the robustness is increased by using a committee machine of multiple SOM-based detectors. Experimental evaluation made by NIST in the TRECVID evaluation confirms that the SOM-based SBD method works comparatively well in news video segmentation, especially in gradual transition detection.

[1]  Qi Tian,et al.  Multilevel video representation with application to keyframe extraction , 2004, 10th International Multimedia Modelling Conference, 2004. Proceedings..

[2]  Timo Honkela,et al.  WEBSOM - Self-organizing maps of document collections , 1998, Neurocomputing.

[3]  S. V. N. Vishwanathan,et al.  NICTA at TRECVID 2005 Shot Boundary Detection Task , 2005, TRECVID.

[4]  Jussi Pakkanen,et al.  Approaches for content-based retrieval of surface defect images ; Pintavirhekuvien sisältöpohjaisesta hausta , 2006 .

[5]  Adnan M. Alattar Detecting and compressing dissolve regions in video sequences with a DVI multimedia image compression algorithm , 1993, 1993 IEEE International Symposium on Circuits and Systems.

[6]  Harald Kosch,et al.  VIDEX: an integrated generic video indexing approach , 2000, MM 2000.

[7]  Erkki Oja,et al.  Application of tree structured self-organizing maps in content-based image retrieval , 1999 .

[8]  Keiichiro Hoashi,et al.  Shot Boundary Detection and Low-Level Feature Extraction Experiments for TRECVID 2005 , 2005, TRECVID.

[9]  Noel E. O'Connor,et al.  Audio and video processing for automatic TV advertisement detection , 2001 .

[10]  Stan Davis,et al.  Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .

[11]  Amarnath Gupta,et al.  Visual information retrieval , 1997, CACM.

[12]  Erkki Oja,et al.  PicSOM-self-organizing image retrieval with MPEG-7 content descriptors , 2002, IEEE Trans. Neural Networks.

[13]  Olli Simula,et al.  Process Monitoring and Modeling Using the Self-Organizing Map , 1999, Integr. Comput. Aided Eng..

[14]  M. V. Velzen,et al.  Self-organizing maps , 2007 .

[15]  Shih-Fu Chang,et al.  Story boundary detection in large broadcast news video archives: techniques, experience and trends , 2004, MULTIMEDIA '04.

[16]  Sami S. Brandt,et al.  Use of Shape Features in Content-Based Image Retrieval , 1999 .

[17]  Mika Rummukainen,et al.  Implementing Multimedia Retrieval Markup Language for image retrieval systems' comparison , 2003 .

[18]  Ramin Zabih,et al.  A feature-based algorithm for detecting and classifying scene breaks , 1995, MULTIMEDIA '95.

[19]  Alexander Dekhtyar,et al.  Information Retrieval , 2018, Lecture Notes in Computer Science.

[20]  Jorma Laaksonen,et al.  PicSOM Experiments in TRECVID 2018 , 2015, TRECVID.

[21]  Georges Quénot,et al.  CLIPS at TRECVID : Shot Boundary Detection and Feature Detection , 2003, TRECVID.

[22]  Joe Marks,et al.  An empirical study of algorithms for point-feature label placement , 1995, TOGS.

[23]  Cheng Cai,et al.  TRECVID2005 Experiments in The Hong Kong Polytechnic University: Shot Boundary Detection Based on a Multi-Step Comparison Scheme , 2005, TRECVID.

[24]  Nevenka Dimitrova,et al.  Video keyframe extraction and filtering: a keyframe is not a keyframe to everyone , 1997, CIKM '97.

[25]  Wayne H. Wolf,et al.  Key frame selection by motion analysis , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[26]  Teuvo Kohonen,et al.  The 'neural' phonetic typewriter , 1988, Computer.

[27]  Erkki Oja,et al.  Statistical Shape Features for Content-Based Image Retrieval , 2004, Journal of Mathematical Imaging and Vision.

[28]  László Böszörményi,et al.  VIDEX: an integrated generic video indexing approach , 2000, ACM Multimedia.

[29]  Boon-Lock Yeo,et al.  Rapid scene analysis on compressed video , 1995, IEEE Trans. Circuits Syst. Video Technol..

[30]  Arding Hsu,et al.  Feature management for large video databases , 1993, Electronic Imaging.

[31]  Thomas S. Huang,et al.  Exploring video structure beyond the shots , 1998, Proceedings. IEEE International Conference on Multimedia Computing and Systems (Cat. No.98TB100241).

[32]  Qi Tian,et al.  A Two-Level Multi-Modal Approach for Story Segmentation of Large News Video Corpus , 2003, TRECVID.

[33]  Atreyi Kankanhalli,et al.  Automatic partitioning of full-motion video , 1993, Multimedia Systems.

[34]  M. Pauline Baker,et al.  Computer graphics with OpenGL , 1986 .

[35]  Simon Haykin,et al.  Neural Networks: A Comprehensive Foundation , 1998 .

[36]  Ramesh C. Jain,et al.  Digital video segmentation , 1994, MULTIMEDIA '94.

[37]  Wessel Kraaij,et al.  TRECVID 2005-An Introduction , 2005 .

[38]  Mats Sjöberg,et al.  Content-based retrieval of hierarchical objects with PicSOM , 2006 .

[39]  D. Huffman A Method for the Construction of Minimum-Redundancy Codes , 1952 .

[40]  John S. Boreczky,et al.  Comparison of video shot boundary detection techniques , 1996, J. Electronic Imaging.

[41]  Alan Hanjalic,et al.  Shot-boundary detection: unraveled and resolved? , 2002, IEEE Trans. Circuits Syst. Video Technol..

[42]  Hung-Khoon Tan,et al.  Motion Driven Approaches to Shot Boundary Detection, Low-Level Feature Extraction and BBC Rushes Characterization at TRECVID 2005 , 2005, TRECVID.

[43]  Markus Koskela,et al.  Interactive image retrieval using self-organizing maps , 2003 .

[44]  Shih-Fu Chang,et al.  Image Retrieval: Current Techniques, Promising Directions, and Open Issues , 1999, J. Vis. Commun. Image Represent..

[45]  Keiichiro Hoashi,et al.  Shot Boundary Detection and High-Level Feature Extraction Experiments for TRECVID 2006. , 2005 .

[46]  Alan F. Smeaton,et al.  News story segmentation in the Fischlar video indexing system , 2001, Proceedings 2001 International Conference on Image Processing (Cat. No.01CH37205).

[47]  Noel Murphy,et al.  Automatic TV advertisement detection from MPEG bitstream , 2002, Pattern Recognit..