1. SYSTEM TECHNICAL DESCRIPTION The ClassMiner system demonstrates a fully implemented tool for scalable video skimming and summarization. The key technology in the system is the integrated medical video content structure and events mining process, which was presented in a paper at the SIGMOD workshop on Data Mining and Knowledge Discovery [1]. As the system architecture in Fig. 1 indicates, we first apply a general video shot segmentation and key-frame selection scheme to parse the video stream into physical units. Then, the video group detection, scene detection and clustering strategies are executed to mine the video content structure. Various visual and audio feature processing techniques are utilized to detect some semantic cues, such as slides, face and speaker changes, etc. within the video, and these detection results are joined together to mine three types of events (presentation, dialog, clinical operation) from the detected video scenes. Finally, a scalable video skimming and summarization tool is constructed based on the mined video content structure and event information to help the user visualize and access video content.
[1]
Jianping Fan,et al.
Automatic model-based semantic object extraction algorithm
,
2001,
IEEE Trans. Circuits Syst. Video Technol..
[2]
Jianping Fan,et al.
ClassMiner: Mining Medical Video Content Structure and Events Towards Efficient Access and Scalable Skimming
,
2002,
DMKD.
[3]
Jianping Fan,et al.
Hierarchical video summarization for medical data
,
2001,
IS&T/SPIE Electronic Imaging.
[4]
Jianping Fan,et al.
Towards facial feature extraction and verification for omni-face detection in video/images
,
2002,
Proceedings. International Conference on Image Processing.
[5]
Christian Wellekens,et al.
DISTBIC: A speaker-based segmentation for audio data indexing
,
2000,
Speech Commun..