Detection of Duplicated Scenes from MPEG Streams Based on Generated Code Size

We developed a technique to detect duplicated scenes, such as commercial messages, TV program theme songs, and repeated scenes from MPEG streams. Conventional matching techniques require numerous calculations to estimate similarity between scenes. In our technique, similarity is estimated with compressed code sizes without decoding pictures. We divided scenes into shots by detecting cut points, which probably correspond to large segments of generated code in MPEG streams. Pictures in these shots are not decoded, but only shot lengths are used for scene matching. If a series of shot lengths for scene A is the same as that for scene B, we can infer that both scenes are identical. This technique, called shot length matching (SLM), requires no image processing and works very fast. We applied SLM to 80 min. MPEG streams stored on an hard disk drive to detect and delete duplicated scenes, and we obtained 99.5% precision and a processing time of 0.157 s.