Automatic attendance rating of movie content using bag of audio words representation

The sensory experience of watching a movie, links input from both sight and hearing modalities. Yet traditionally, the motion picture rating system largely relies on the visual content of the film, to make its informed decisions to parents. The current rating process is fairly elaborate. It requires a group of parents to attend a full screening, manually prepare and submit their opinions, and vote out the appropriate audience age for viewing. Rather, our work explores the feasibility of classifying age attendance of a movie automatically, resorting to solely analyzing the movie auditory data. Our high performance software records the audio content of the shorter movie trailer, and builds a labeled training set of original and artificially distorted clips. We use a bag of audio words to effectively represent the film sound track, and demonstrate robust and closely correlated classification accuracy, in exploiting boolean discrimination and ranked retrieval methods.

[1]  Chong-Wah Ngo,et al.  Semantic Indexing and Multimedia Event Detection: ECNU at TRECVID 2012 , 2012, TRECVID.

[2]  Ronald W. Schafer,et al.  Introduction to Digital Speech Processing , 2007, Found. Trends Signal Process..

[3]  Murat Akbacak,et al.  Bag-of-Audio-Words Approach for Multimedia Event Classification , 2012, INTERSPEECH.

[4]  Chong-Wah Ngo,et al.  Coherent bag-of audio words model for efficient large-scale video copy detection , 2010, CIVR '10.

[5]  Thorsten Joachims,et al.  Making large-scale support vector machine learning practical , 1999 .

[6]  Harry Shum,et al.  A multi-sample, multi-tree approach to bag-of-words image representation for image retrieval , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[7]  Gert R. G. Lanckriet,et al.  Semantic Annotation and Retrieval of Music using a Bag of Systems Representation , 2011, ISMIR.

[8]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[9]  Chong-Wah Ngo,et al.  Evaluating bag-of-visual-words representations in scene classification , 2007, MIR '07.

[10]  Samy Bengio,et al.  Large-scale content-based audio retrieval from text queries , 2008, MIR '08.

[11]  A. Noll Short‐Time Spectrum and “Cepstrum” Techniques for Vocal‐Pitch Detection , 1964 .

[12]  Allen Gersho,et al.  Vector quantization and signal compression , 1991, The Kluwer international series in engineering and computer science.

[13]  Gerard Salton,et al.  A vector space model for automatic indexing , 1975, CACM.

[14]  Joydeep Ghosh,et al.  A text retrieval approach to content-based audio retrieval , 2008 .