Web Video Data Clustering and Recognition Using Histograms of Phoneme Symbols

The clustering and recognition of Web video content play an important role in multimedia information retrieval. This paper proposes a method for both clustering and recognizing Web video content using a histogram of phoneme symbols (HoPS). HoPS contains information about speech and sound intervals. In this study, three experiments were conducted.The first experiment allocated HoPS feature of video intervals in a 3D space using PCA and quantification method IV (Q-IV). The second experiment applied the k-nearest neighbor (k-NN) method to analyze the difficulties in clustering. The third experiment recognized unknown video intervals by using the distance between HoPS of the query and a category average. The accuracy of the recognition results were 44.3% and 36.9% using the Mahalanobis distance and the correlation distance for the category average of training data, respectively.

[1]  Teuvo Kohonen,et al.  The self-organizing map , 1990 .

[2]  Shigeru Katagiri,et al.  ATR Japanese speech database as a tool of speech recognition and synthesis , 1990, Speech Commun..

[3]  Yuichi Yaguchi,et al.  Accompaniment included song waveform retrieval based on framewise phoneme recognition , 2006 .

[4]  Yuichi Yaguchi,et al.  Song Wave Retrieval Based on Frame-Wise Phoneme Recognition , 2005, AIRS.

[5]  Andreas Dieberger,et al.  Hierarchical brushing in a collection of video data , 2001, Proceedings of the 34th Annual Hawaii International Conference on System Sciences.

[6]  Mark Pawlewski,et al.  Video genre classification using dynamics , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[7]  Keitaro Naruse,et al.  Speech and Song Search on the Web: System Design and Implementation , 2007, 7th IEEE International Conference on Computer and Information Technology (CIT 2007).

[8]  Peng Wang,et al.  Scene Segmentation and Categorization Using NCuts , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Paul Over,et al.  Evaluation campaigns and TRECVid , 2006, MIR '06.

[10]  Masataka Goto,et al.  Automatic transcription for a web 2.0 service to search podcasts , 2007, INTERSPEECH.

[11]  Belur V. Dasarathy,et al.  Nearest neighbor (NN) norms: NN pattern classification techniques , 1991 .

[12]  Loren Enochson,et al.  PROGRAMMING AND ANALYSIS FOR DIGITAL TIME SERIES DATA , 1968 .

[13]  Xian-Sheng Hua,et al.  Multi-modality web video categorization , 2007, MIR '07.

[14]  David J. C. MacKay,et al.  Information Theory, Inference, and Learning Algorithms , 2004, IEEE Transactions on Information Theory.

[15]  Hao Jiang,et al.  New Functions of the CAD System for Lung Cancer Screening by CT , 1999 .

[16]  Nello Cristianini,et al.  An introduction to Support Vector Machines , 2000 .

[17]  Erwin M. Bakker,et al.  Semantic Video Retrieval Using Audio Analysis , 2002, CIVR.

[18]  Ryu-ichi Oka,et al.  Speaker-independent word speech recognition using the blurred orientation patterns obtained from the vector field of spectrum , 1988, [1988 Proceedings] 9th International Conference on Pattern Recognition.

[19]  Masashi Yamamuro,et al.  A practical query-by-humming system for a large music database , 2000, ACM Multimedia.

[20]  Yixin Chen,et al.  Image Categorization by Learning and Reasoning with Regions , 2004, J. Mach. Learn. Res..