BIRD: Big Impulse Response Dataset

This paper introduces BIRD, the Big Impulse Response Dataset. This open dataset consists of 100,000 multichannel room impulse responses (RIRs) generated from simulations using the Image Method, making it the largest multichannel open dataset currently available. These RIRs can be used toperform efficient online data augmentation for scenarios that involve two microphones and multiple sound sources. The paper also introduces use cases to illustrate how BIRD can perform data augmentation with existing speech corpora.

[1]  Maurizio Omologo,et al.  On the selection of the impulse responses for distant-speech recognition based on contaminated speech training , 2014, INTERSPEECH.

[2]  James R. Glass,et al.  Sound Event Localization and Detection Using CRNN on Pairs of Microphones , 2019, DCASE.

[3]  Hao Tang,et al.  A Study of Enhancement, Augmentation, and Autoencoder Methods for Domain Adaptation in Distant Speech Recognition , 2018, INTERSPEECH.

[4]  Allan D. Pierce,et al.  Acoustics , 1989 .

[5]  Alastair H. Moore,et al.  The ACE challenge — Corpus description and performance evaluation , 2015, 2015 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA).

[6]  Emmanuel Vincent,et al.  A French Corpus for Distant-Microphone Speech Processing in Real Homes , 2016, INTERSPEECH.

[7]  Sanjeev Khudanpur,et al.  A study on data augmentation of reverberant speech for robust speech recognition , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[8]  Jonathan Le Roux,et al.  Improved MVDR Beamforming Using Single-Channel Mask Prediction Networks , 2016, INTERSPEECH.

[9]  Emanuel A. P. Habets,et al.  New Insights Into the MVDR Beamformer in Room Acoustics , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[10]  Zhong-Hua Fu,et al.  GPU-based image method for room impulse response calculation , 2015, Multimedia Tools and Applications.

[11]  Bernd Edler,et al.  CountNet: Estimating the Number of Concurrent Speakers Using Supervised Learning , 2019, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[12]  DeLiang Wang,et al.  Ideal ratio mask estimation using deep neural networks for robust speech recognition , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[13]  Janet M. Baker,et al.  The Design for the Wall Street Journal-based CSR Corpus , 1992, HLT.

[14]  Sanjeev Khudanpur,et al.  Librispeech: An ASR corpus based on public domain audio books , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[15]  Maurizio Omologo,et al.  The DIRHA-ENGLISH corpus and related tasks for distant-speech recognition in domestic environments , 2015, 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU).

[16]  Brigitte Meillon,et al.  The Sweet-Home speech and multimodal corpus for home automation interaction , 2014, LREC.

[17]  Victor Zue,et al.  Speech database development at MIT: Timit and beyond , 1990, Speech Commun..

[18]  H. Saunders,et al.  Acoustics: An Introduction to Its Physical Principles and Applications , 1984 .

[19]  Ivan Dokmanic,et al.  Pyroomacoustics: A Python Package for Audio Room Simulation and Array Processing Algorithms , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[20]  Reinhold Häb-Umbach,et al.  BLSTM supported GEV beamformer front-end for the 3RD CHiME challenge , 2015, 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU).

[21]  Maurizio Omologo,et al.  Contaminated speech training methods for robust DNN-HMM distant speech recognition , 2017, INTERSPEECH.

[22]  Zhuo Chen,et al.  Deep clustering: Discriminative embeddings for segmentation and separation , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[23]  Jonathan Vincent,et al.  GEV Beamforming Supported by DOA-Based Masks Generated on Pairs of Microphones , 2020, INTERSPEECH.

[24]  Peter Vary,et al.  A binaural room impulse response database for the evaluation of dereverberation algorithms , 2009, 2009 16th International Conference on Digital Signal Processing.

[25]  Karol J. Piczak ESC: Dataset for Environmental Sound Classification , 2015, ACM Multimedia.

[26]  Jont B. Allen,et al.  Image method for efficiently simulating small‐room acoustics , 1976 .

[27]  Joon-Hyuk Chang,et al.  Ensemble of Jointly Trained Deep Neural Network-Based Acoustic Models for Reverberant Speech Recognition , 2016, Digit. Signal Process..

[28]  Mark B. Sandler,et al.  Database of omnidirectional and B-format room impulse responses , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.