Ensemble audio segmentation for radio and television programmes

State-of-the-art audio segmentation strategies obtain good results when performing simple tasks but its performance is degraded when segmenting real-world scenarios such as radio and television programmes; this issue can be partially solved by performing a fusion of different audio segmentation strategies. Hence, a framework to perform decision-level fusion in the audio segmentation task is presented in this paper. First, the class-conditional probabilities of each audio segmentation strategy are estimated from a confusion matrix obtained by performing audio segmentation in a training dataset. Performance measures are extracted from these class-conditional probabilities, which are used to compute different estimates of the classifier’s reliability; specifically, reliability estimates based on precision, recall, accuracy, F-score and mutual information were proposed. These reliability estimates are used as weights in a weighted majority voting fusion strategy. The validity of the proposed fusion scheme and reliability estimates was assessed in the framework of Albayzin 2010, 2012 and 2014 audio segmentation evaluations, which consisted in segmenting collections of radio and television programmes. The experimental results showed that this simple fusion strategy improves the performance achieved by the individual audio segmentation strategies and by other well-known decision-level fusion strategies.

[1]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[2]  Sung-Bae Cho,et al.  Multiple network fusion using fuzzy logic , 1995, IEEE Trans. Neural Networks.

[3]  Torbjørn Eltoft,et al.  Fusion of optical and multifrequency polsar data for forest classification , 2012, 2012 IEEE International Geoscience and Remote Sensing Symposium.

[4]  Lluís A. Belanche Muñoz,et al.  Feature selection algorithms: a survey and experimental evaluation , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[5]  Paula Lopez-Otero,et al.  GTM-UVigo System for Albayzin 2014 Audio Segmentation Evaluation , 2014 .

[6]  Guy Lapalme,et al.  A systematic analysis of performance measures for classification tasks , 2009, Inf. Process. Manag..

[7]  Jiri Matas,et al.  On Combining Classifiers , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[8]  H GunatilakaAjith,et al.  Feature-Level and Decision-Level Fusion of Noncoincidently Sampled Sensors for Land Mine Detection , 2001 .

[9]  Taras Butko,et al.  Albayzin-2010 audio segmentation evaluation: evaluation setup and results , 2010 .

[10]  Mauro Cettolo,et al.  Efficient audio segmentation algorithms based on the BIC , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[11]  Ludmila I. Kuncheva,et al.  Combining Pattern Classifiers: Methods and Algorithms , 2004 .

[12]  Mikel Penagarikano,et al.  GTTS Systems for the Albayzin 2012 Audio Segmentation Evaluation , 2010 .

[13]  Florian Metze,et al.  Improved audio features for large-scale multimedia event detection , 2014, 2014 IEEE International Conference on Multimedia and Expo (ICME).

[14]  X. Anguera,et al.  XBIC: nueva medida para segmentación de locutor hacia el indexado automático de la señal de voz , 2004 .

[15]  Tim Pohle,et al.  AUTOMATIC MUSIC DETECTION IN TELEVISION PRODUCTIONS , 2007 .

[16]  Claude Barras,et al.  Augmenting short-term cepstral features with long-term discriminative features for speaker verification of telephone data , 2013, INTERSPEECH.

[17]  R. Polikar,et al.  Ensemble based systems in decision making , 2006, IEEE Circuits and Systems Magazine.

[18]  Luiz Eduardo Soares de Oliveira,et al.  Pairwise fusion matrix for combining classifiers , 2007, Pattern Recognit..

[19]  Pierre Comon,et al.  Independent component analysis, A new concept? , 1994, Signal Process..

[20]  Ching Y. Suen,et al.  The behavior-knowledge space method for combination of multiple classifiers , 1993, Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[21]  A. Ross,et al.  Level Fusion Using Hand and Face Biometrics , 2005 .

[22]  Eduardo Lleida,et al.  Audio segmentation-by-classification approach based on factor analysis in broadcast news domain , 2014, EURASIP J. Audio Speech Music. Process..

[23]  João Paulo da Silva Neto,et al.  A stream-based audio segmentation, classification and clustering pre-processing system for broadcast news using ANN models , 2005, INTERSPEECH.

[24]  Federico Castanedo,et al.  A Review of Data Fusion Techniques , 2013, TheScientificWorldJournal.

[25]  Joan Albert Silvestre Cerdà,et al.  Albayzin Evaluation: The PRHLT-UPV Audio Segmentation System , 2012 .

[26]  Christian Wellekens,et al.  DISTBIC: A speaker-based segmentation for audio data indexing , 2000, Speech Commun..

[27]  Brian A. Baertlein,et al.  Feature-Level and Decision-Level Fusion of Noncoincidently Sampled Sensors for Land Mine Detection , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[28]  George Tzanetakis,et al.  Manipulation, analysis and retrieval systems for audio signals , 2002 .

[29]  Gayatri M. Bhandari,et al.  Audio Segmentation for Speech Recognition Using Segment Features , 2014 .

[30]  Raymond N. J. Veldhuis,et al.  Practical Biometric Authentication with Template Protection , 2005, AVBPA.

[31]  Arun Ross,et al.  Feature level fusion of hand and face biometrics , 2005, SPIE Defense + Commercial Sensing.

[32]  Subhash C. Bagui,et al.  Combining Pattern Classifiers: Methods and Algorithms , 2005, Technometrics.

[33]  Manfred K. Warmuth,et al.  The Weighted Majority Algorithm , 1994, Inf. Comput..

[34]  Glenn Shafer,et al.  A Mathematical Theory of Evidence , 2020, A Mathematical Theory of Evidence.

[35]  Gaël Richard,et al.  Comparison of different strategies for a SVM-based audio segmentation , 2009, 2009 17th European Signal Processing Conference.

[36]  Björn W. Schuller,et al.  Late fusion of individual engines for improved recognition of negative emotion in speech - learning vs. democratic vote , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[37]  Juan José Rodríguez Diez,et al.  A weighted voting framework for classifiers ensembles , 2012, Knowledge and Information Systems.

[38]  Taras Butko,et al.  Audio segmentation of broadcast news in the Albayzin-2010 evaluation: overview, results, and discussion , 2011, EURASIP J. Audio Speech Music. Process..

[39]  Raymond N. J. Veldhuis,et al.  Threshold-optimized decision-level fusion and its application to biometrics , 2009, Pattern Recognit..