Fearless Steps: Apollo-11 Corpus Advancements for Speech Technologies from Earth to the Moon

The Apollo Program is one of the most significant benchmarks for technology and innovation in human history. The previously introduced UTD-CRSS Fearless Steps initiative resulted in the digitization of the original analog audio tapes recorded during the Apollo Space Missions. The entire speech data for the Apollo 11 Mission is now being made publicly available with the release of the Fearless Steps Corpus. This corpus consists of a cumulative 19,000 hours of conversational speech spanning over thirty time-synchronized channels. With over six hundred speakers, the corpus has a rich collection of information which can be beneficial for research and advancement in the speech and language community. Recent efforts on this data have led to the generation of pipeline diarization transcripts for the entire speech corpus. Research has also been done to address speech and natural language tasks such as speech activity detection, speech recognition, and sentiment analysis. This paper provides an overview of the Fearless Steps Corpus and highlights the factors that make the processing of this data a challenging problem. To promote further development of algorithms for naturalistic data, five challenge tasks are also organized. We also describe the challenge tasks with details on a fully transcribed subset of the corpus, and initial results achieved by our systems.

[1]  John H. L. Hansen,et al.  Toward Access to Multi-Perspective Archival Spoken Word Content , 2016, ICADL.

[2]  Daniel Povey,et al.  The Kaldi Speech Recognition Toolkit , 2011 .

[3]  John H. L. Hansen,et al.  Speech activity detection for NASA apollo space missions: challenges and solutions , 2014, INTERSPEECH.

[4]  Chengzhu Yu,et al.  A study of voice production characteristics of astronuat speech during Apollo 11 for speaker modeling in space. , 2017, The Journal of the Acoustical Society of America.

[5]  John H. L. Hansen,et al.  'houston, we have a solution': using NASA apollo program to advance speech and language processing technology , 2013, INTERSPEECH.

[6]  John H. L. Hansen,et al.  Sentiment extraction from natural audio streams , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[7]  Lakshmish Kaushik Conversational Speech Understanding in Highly Naturalistic Audio Streams , 2018 .

[8]  John H. L. Hansen,et al.  Multi-Channel Apollo Mission Speech Transcripts Calibration , 2017, INTERSPEECH.

[9]  John H. L. Hansen,et al.  Apollo Archive Explorer: An Online Tool to Explore and Study Space Missions , 2015 .

[10]  John H. L. Hansen,et al.  Automatic sentiment extraction from YouTube videos , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.

[11]  John H. L. Hansen,et al.  Curriculum Learning Based Probabilistic Linear Discriminant Analysis for Noise Robust Speaker Recognition , 2017, INTERSPEECH.

[12]  John H. L. Hansen,et al.  Active Learning Based Constrained Clustering For Speaker Diarization , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[13]  John Hansen,et al.  The Heroes Behind the Heroes of Apollo-11: Role of STEM , 2020 .

[14]  John H. L. Hansen,et al.  Keyword recognition with phone confusion networks and phonological features based keyword threshold detection , 2010, 2010 Conference Record of the Forty Fourth Asilomar Conference on Signals, Systems and Computers.

[15]  John H. L. Hansen,et al.  Curriculum Learning Based Approaches for Noise Robust Speaker Recognition , 2018, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[16]  Mark Liberman,et al.  Transcriber: a free tool for segmenting, labeling and transcribing speech , 1998, LREC.

[17]  John H. L. Hansen,et al.  'houston, we have a solution': a case study of the analysis of astronaut speech during NASA apollo 11 for long-term speaker modeling , 2014, INTERSPEECH.

[18]  John H. L. Hansen,et al.  Automatic Sentiment Detection in Naturalistic Audio , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.