论文信息 - Web Video Data Clustering and Recognition Using Histograms of Phoneme Symbols

Web Video Data Clustering and Recognition Using Histograms of Phoneme Symbols

The clustering and recognition of Web video content play an important role in multimedia information retrieval. This paper proposes a method for both clustering and recognizing Web video content using a histogram of phoneme symbols (HoPS). HoPS contains information about speech and sound intervals. In this study, three experiments were conducted.The first experiment allocated HoPS feature of video intervals in a 3D space using PCA and quantification method IV (Q-IV). The second experiment applied the k-nearest neighbor (k-NN) method to analyze the difficulties in clustering. The third experiment recognized unknown video intervals by using the distance between HoPS of the query and a category average. The accuracy of the recognition results were 44.3% and 36.9% using the Mahalanobis distance and the correlation distance for the category average of training data, respectively.

Yusuke Sakai | Yuichi Yaguchi | Ryuichi Oka | Keisuke Yoshida

[1] Teuvo Kohonen,et al. The self-organizing map , 1990 .

[2] Shigeru Katagiri,et al. ATR Japanese speech database as a tool of speech recognition and synthesis , 1990, Speech Commun..

[3] Yuichi Yaguchi,et al. Accompaniment included song waveform retrieval based on framewise phoneme recognition , 2006 .

[4] Yuichi Yaguchi,et al. Song Wave Retrieval Based on Frame-Wise Phoneme Recognition , 2005, AIRS.

[5] Andreas Dieberger,et al. Hierarchical brushing in a collection of video data , 2001, Proceedings of the 34th Annual Hawaii International Conference on System Sciences.

[6] Mark Pawlewski,et al. Video genre classification using dynamics , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[7] Keitaro Naruse,et al. Speech and Song Search on the Web: System Design and Implementation , 2007, 7th IEEE International Conference on Computer and Information Technology (CIT 2007).

[8] Peng Wang,et al. Scene Segmentation and Categorization Using NCuts , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[9] Paul Over,et al. Evaluation campaigns and TRECVid , 2006, MIR '06.

[10] Masataka Goto,et al. Automatic transcription for a web 2.0 service to search podcasts , 2007, INTERSPEECH.

[11] Belur V. Dasarathy,et al. Nearest neighbor (NN) norms: NN pattern classification techniques , 1991 .

[12] Loren Enochson,et al. PROGRAMMING AND ANALYSIS FOR DIGITAL TIME SERIES DATA , 1968 .

[13] Xian-Sheng Hua,et al. Multi-modality web video categorization , 2007, MIR '07.

[14] David J. C. MacKay,et al. Information Theory, Inference, and Learning Algorithms , 2004, IEEE Transactions on Information Theory.

[15] Hao Jiang,et al. New Functions of the CAD System for Lung Cancer Screening by CT , 1999 .

[16] Nello Cristianini,et al. An introduction to Support Vector Machines , 2000 .

[17] Erwin M. Bakker,et al. Semantic Video Retrieval Using Audio Analysis , 2002, CIVR.

[18] Ryu-ichi Oka,et al. Speaker-independent word speech recognition using the blurred orientation patterns obtained from the vector field of spectrum , 1988, [1988 Proceedings] 9th International Conference on Pattern Recognition.

[19] Masashi Yamamuro,et al. A practical query-by-humming system for a large music database , 2000, ACM Multimedia.

[20] Yixin Chen,et al. Image Categorization by Learning and Reasoning with Regions , 2004, J. Mach. Learn. Res..