Pratiques d'évaluation en ASR et biais de performance (Evaluation methodology in ASR and performance bias)

Nous proposons une reflexion sur les pratiques d’evaluation des systemes de reconnaissance automatique de la parole (ASR). Apres avoir defini la notion de discrimination d’un point de vue legal et la notion d’equite dans les systemes d’intelligence artificielle, nous nous interessons aux pratiques actuelles lors des grandes campagnes d’evaluation. Nous observons que la variabilite de la parole et plus particulierement celle de l’individu n’est pas prise en compte dans les protocoles d’evaluation actuels rendant impossible l’etude de biais potentiels dans les systemes.

[1]  David S Pallet Performance assessment of automatic speech recognizers , 1985 .

[2]  Steffen Staab,et al.  Bias in Data-driven AI Systems - An Introductory Survey , 2020, ArXiv.

[3]  Shinji Watanabe,et al.  ESPnet: End-to-End Speech Processing Toolkit , 2018, INTERSPEECH.

[4]  Arvind Narayanan,et al.  Semantics derived automatically from language corpora contain human-like biases , 2016, Science.

[5]  Dirk Hovy,et al.  The Social Impact of Natural Language Processing , 2016, ACL.

[6]  Michel Vacher,et al.  Speech recognition of aged voice in the AAL context: Detection of distress sentences , 2013, 2013 7th Conference on Speech Technology and Human - Computer Dialogue (SpeD).

[7]  Adam Tauman Kalai,et al.  Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings , 2016, NIPS.

[8]  Sanjeev Khudanpur,et al.  Librispeech: An ASR corpus based on public domain audio books , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[9]  Daniel Povey,et al.  The Kaldi Speech Recognition Toolkit , 2011 .

[10]  Andrew D. Selbst,et al.  Big Data's Disparate Impact , 2016 .

[11]  Andreas Stolcke,et al.  The Microsoft 2017 Conversational Speech Recognition System , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[12]  Bettina Berendt,et al.  Better decision support through exploratory discrimination-aware data mining: foundations and empirical evidence , 2014, Artificial Intelligence and Law.

[13]  Lori Lamel,et al.  Do speech recognizers prefer female speakers? , 2005, INTERSPEECH.

[14]  Timnit Gebru,et al.  Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification , 2018, FAT.

[15]  Tony Belpaeme,et al.  Child Speech Recognition in Human-Robot Interaction: Evaluations and Recommendations , 2017, 2017 12th ACM/IEEE International Conference on Human-Robot Interaction (HRI.

[16]  C. D. Forgie,et al.  Automatic Recognition of Spoken Digits , 1958 .

[17]  Javier Sánchez-Monedero,et al.  What does it mean to solve the problem of discrimination in hiring? Social, technical and legal perspectives from the UK on automated hiring systems , 2019, ArXiv.

[18]  Ralf Dresner,et al.  Rethinking Context Language As An Interactive Phenomenon , 2016 .

[19]  Mark J. F. Gales,et al.  The MGB challenge: Evaluating multi-genre broadcast media recognition , 2015, 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU).

[20]  Rachael Tatman,et al.  Effects of Talker Dialect, Gender & Race on Accuracy of Bing Speech and YouTube Automatic Captions , 2017, INTERSPEECH.

[21]  Solange Rossato,et al.  Gender Representation in French Broadcast Corpora and Its Impact on ASR Performance , 2019, AI4TV@MM.

[22]  Rachael Tatman,et al.  Gender and Dialect Bias in YouTube’s Automatic Captions , 2017, EthNLP@EACL.

[23]  Christo Wilson,et al.  Investigating the Impact of Gender on Rank in Resume Search Engines , 2018, CHI.

[24]  Daniel Jurafsky,et al.  Word embeddings quantify 100 years of gender and ethnic stereotypes , 2017, Proceedings of the National Academy of Sciences.

[25]  Andy Way,et al.  Getting Gender Right in Neural Machine Translation , 2019, EMNLP.

[26]  Jody Kreiman,et al.  Physical Characteristics and the Voice: Can we Hear What a Speaker Looks Like? , 2011 .

[27]  Xiaodong Cui,et al.  English Conversational Telephone Speech Recognition by Humans and Machines , 2017, INTERSPEECH.

[28]  Jean Carrive,et al.  Description automatique du taux d'expression des femmes dans les flux télévisuels français , 2018, XXXIIe Journées d’Études sur la Parole.