Neural network-based movie dialogue detection

A novel framework for dialogue detection in movies, using indicator functions, is investigated. An indicator function determines if an actor is present at a particular time instant. The cross-correlation function of a pair of indicator functions and its related cross-power spectral density are applied as inputs to neural networks. Several types of neural networks are employed to test the feasibility of the proposed framework, such as perceptrons, radial-basis function networks, and support vector machines. Experiments are conducted on indicator functions extracted from 6 different movies that correspond to a total of 41 dialogue instances and 20 non-dialogue instances. High accuracy detection is achieved on average, ranging between 84.780%±5.499% and 94.740%±5.263%, with a mean value of 88.990%±2.967%.

[1]  Lie Lu,et al.  Speaker change detection and tracking in real-time news broadcasting analysis , 2002, MULTIMEDIA '02.

[2]  Harriet J. Nock,et al.  Audio-visual synchrony for detection of monologues in video archives , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[3]  Ioannis Pitas,et al.  A Framework for Dialogue Detection in Movies , 2006, MRCS.

[4]  B. Hofmann-Wellenhof,et al.  Introduction to spectral analysis , 1986 .

[5]  D. Arijon,et al.  Grammar of Film Language , 1976 .

[6]  Pavel Král,et al.  Combination of classifiers for automatic recognition of dialog acts , 2005, INTERSPEECH.

[7]  Ali N. Akansu,et al.  Multi-Modal Dialog Scene Detection Using Hidden Markov Models for Content-Based Multimedia Indexing , 2001, Multimedia Tools and Applications.

[8]  Constantine Kotropoulos,et al.  Automatic speaker change detection with the Bayesian information criterion using MPEG-7 features and a fusion scheme , 2006, 2006 IEEE International Symposium on Circuits and Systems.

[9]  Yoav Freund,et al.  Large Margin Classification Using the Perceptron Algorithm , 1998, COLT' 98.

[10]  Mubarak Shah,et al.  Semantic classification of movie scenes using finite state machines , 2005 .

[11]  Brian Birge,et al.  PSOt - a particle swarm optimization toolbox for use with Matlab , 2003, Proceedings of the 2003 IEEE Swarm Intelligence Symposium. SIS'03 (Cat. No.03EX706).

[12]  Andrew Salway,et al.  Formalising stories: sequences of events and state changes , 2004, 2004 IEEE International Conference on Multimedia and Expo (ICME) (IEEE Cat. No.04TH8763).

[13]  Ioan Cristian Trelea,et al.  The particle swarm optimization algorithm: convergence analysis and parameter selection , 2003, Inf. Process. Lett..

[14]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[15]  Russell C. Eberhart,et al.  A new optimizer using particle swarm theory , 1995, MHS'95. Proceedings of the Sixth International Symposium on Micro Machine and Human Science.

[16]  Noel E. O'Connor,et al.  Dialogue scene detection in movies using low and mid-level visual features , 2004 .

[17]  Ali N. Akansu,et al.  Comparative analysis of hidden Markov models for multi-modal dialogue scene indexing , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[18]  Harriet J. Nock,et al.  Audio-visual synchrony for detection of monologues in video archives , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[19]  David W. Hosmer,et al.  Applied Logistic Regression , 1991 .