INA ’ S MIREX 2018 MUSIC AND SPEECH DETECTION SYSTEM

A convolutional neural network (CNN) based architecture is proposed for MIREX 2018 music and speech detection challenge. The system uses log-mel filterbank features. It has 4 convolutional and 4 dense layers. It is part of the inaSpeechSegmenter open-source framework, which was designed for conducting gender equality studies.

[1]  Félicien Vallet,et al.  An Effortless Way To Create Large-Scale Datasets For Famous Speakers , 2014, LREC.

[2]  Jean Carrive,et al.  Description automatique du taux d'expression des femmes dans les flux télévisuels français , 2018, XXXIIe Journées d’Études sur la Parole.

[3]  Olivier Galibert,et al.  The REPERE Corpus : a multimodal corpus for person recognition , 2012, LREC.

[4]  Jean Carrive,et al.  Describing Gender Equality in French Audiovisual Streams with a Deep Learning Approach , 2018 .

[5]  George Tzanetakis,et al.  Musical genre classification of audio signals , 2002, IEEE Trans. Speech Audio Process..

[6]  Malcolm Slaney,et al.  Construction and evaluation of a robust multifeature speech/music discriminator , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[7]  Sylvain Meignier,et al.  An Open-Source Speaker Gender Detection Framework for Monitoring Gender Equality , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[8]  Jean Carrive,et al.  Speech Trax: A Bottom to the Top Approach for Speaker Tracking and Indexing in an Archiving Context , 2016, LREC.

[9]  Kong-Aik Lee,et al.  An extensible speaker identification sidekit in Python , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[10]  Daniel Povey,et al.  MUSAN: A Music, Speech, and Noise Corpus , 2015, ArXiv.