SIGTYP 2021 Shared Task: Robust Spoken Language Identification

While language identification is a fundamental speech and language processing task, for many languages and language families it remains a challenging task. For many low-resource and endangered languages this is in part due to resource availability: where larger datasets exist, they may be single-speaker or have different domains than desired application scenarios, demanding a need for domain and speaker-invariant language identification systems. This year’s shared task on robust spoken language identification sought to investigate just this scenario: systems were to be trained on largely single-speaker speech from one domain, but evaluated on data in other domains recorded from speakers under different recording circumstances, mimicking realistic low-resource scenarios. We see that domain and speaker mismatch proves very challenging for current methods which can perform above 95% accuracy in-domain, which domain adaptation can address to some degree, but that these conditions merit further investigation to make spoken language identification accessible in many scenarios.

[1]  Dietrich Klakow,et al.  Rediscovering the Slavic Continuum in Representations Emerging from Neural Models of Spoken Language Identification , 2020, VARDIAL.

[2]  Monojit Choudhury,et al.  The State and Fate of Linguistic Diversity and Inclusion in the NLP World , 2020, ACL.

[3]  Fei He,et al.  Open-source Multi-speaker Speech Corpora for Building Gujarati, Kannada, Malayalam, Marathi, Tamil and Telugu Speech Synthesis Systems , 2020, LREC.

[4]  Joaquín González-Rodríguez,et al.  Automatic language identification using long short-term memory recurrent neural networks , 2014, INTERSPEECH.

[5]  Joaquín González-Rodríguez,et al.  Automatic language identification using deep neural networks , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[6]  Bin Ma,et al.  A Phonotactic Language Model for Spoken Language Identification , 2005, ACL.

[7]  Alvin F. Martin,et al.  The 2011 NIST Language Recognition Evaluation , 2010, INTERSPEECH.

[8]  Alan W. Black,et al.  CMU Wilderness Multilingual Speech Dataset , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[9]  Bin Ma,et al.  The 2015 NIST Language Recognition Evaluation: The Shared View of I2R, Fantastic4 and SingaMS , 2016, INTERSPEECH.

[10]  W. Bruce Croft Typology and Universals , 1990 .

[11]  Celano Giuseppe,et al.  A ResNet-50-Based Convolutional Neural Network Model for Language ID Identification from Speech Recordings , 2021, SIGTYP.

[12]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Maria Koptjevskaja-Tamm,et al.  Linguistic Typology , 2017 .

[14]  Boris Ginsburg,et al.  Quartznet: Deep Automatic Speech Recognition with 1D Time-Channel Separable Convolutions , 2019, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[15]  James R. Glass,et al.  Convolutional Neural Networks and Language Embeddings for End-to-End Dialect Recognition , 2018, Odyssey.

[16]  Supheakmungkol Sarin,et al.  Crowd-Sourced Speech Corpora for Javanese, Sundanese, Sinhala, Nepali, and Bangladeshi Bengali , 2018, SLTU.

[17]  Jean-Luc Gauvain,et al.  Language Recognition for Dialects and Closely Related Languages , 2016, Odyssey.

[18]  Jean-Luc Gauvain,et al.  Language identification using phone-based acoustic likelihoods , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[19]  Benjamin Lecouteux,et al.  Using resources from a closely-related language to develop ASR for a very under-resourced language: a case study for iban , 2015, INTERSPEECH.

[20]  Quoc V. Le,et al.  SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition , 2019, INTERSPEECH.

[21]  Solange Rossato,et al.  Semi-supervised G2p bootstrapping and its application to ASR for a very under-resourced language: Iban , 2014, SLTU.