An audio classification and speech recognition system for video content analysis

Audio can provide useful information for video content analysis. Audio classification and speech recognition for video content analysis is proposed in this paper. Firstly, audio data from video stream is extracted. Secondly, the audio frames are classified into silence, speech and music based on rules and Support Vector Machine(SVM) algorithm. Finally, an automatic speech recognition(ASR) system is applied for speech-to-text conversion. The experimental result on CCTV_NEWS of TRECVID shows that our approach is effective.

[1]  Songyang Lao,et al.  Feature analysis and extraction for audio automatic classification , 2005, SMC.

[2]  Wu Fei,et al.  Video Semantics Mining Using Multi-Modality Subspace Correlation Propagation , 2009 .

[3]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[4]  Vladimir Naumovich Vapni The Nature of Statistical Learning Theory , 1995 .

[5]  Qi Li,et al.  A Robust Endpoint Detection Algorithm for Video Caption Generation , 2008, 2008 The 9th International Conference for Young Computer Scientists.

[6]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[7]  Stan Z. Li,et al.  Content-based audio classification and retrieval using the nearest feature line method , 2000, IEEE Trans. Speech Audio Process..

[8]  John R. Smith,et al.  Semantic Indexing of Multimedia Content Using Visual, Audio, and Text Cues , 2003, EURASIP J. Adv. Signal Process..

[9]  Mosur Ravishankar,et al.  Efficient Algorithms for Speech Recognition. , 1996 .

[10]  Lie Lu,et al.  Digital Object Identifier (DOI) 10.1007/s00530-002-0065-0 Multimedia Systems , 2003 .

[11]  Zhang Yanning An Automatic Caption Generator for Mandarin Broadcast News , 2011 .

[12]  Shih-Fu Chang,et al.  Combining text and audio-visual features in video indexing , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[13]  Stan Z. Li,et al.  Content-based Classification and Retrieval of Audio Using the Nearest Feature Line Method , 2000 .

[14]  Bai Liang,et al.  Feature analysis and extraction for audio automatic classification , 2005, 2005 IEEE International Conference on Systems, Man and Cybernetics.