PitchKeywordExtractor: Prosody-based automatic keyword extraction for speech content

Keyword extraction is widely used for information indexing, compressing, summarizing, etc. Existing keyword extraction techniques apply various text-based algorithms and metrics to locate the keywords. At the same time, some types of audio and audiovisual content, e. g. lectures, talks, interviews and other speech-oriented information, allow to perform keyword search by prosodic accents made by a speaker. This paper presents PitchKeywordExtractor — an algorithm with its software prototype for prosody-based automatic keyword extraction in speech content. It operates together with a third-party automatic speech recognition system, handles speech prosody by a pitch detection algorithm and locates the keywords using pitch contour cross-correlation with four tone units taken from D. Brazil discourse intonation model.

[1]  Marc Leman,et al.  TarsosDSP, a Real-Time Audio Processing Framework in Java , 2014, Semantic Audio.

[2]  Arun Sahayadhas,et al.  Keyword Extraction from Multiple Words for Report Recommendations in Media Wiki , 2017 .

[3]  J. K. Bock,et al.  Intonational marking of given and new information: Some consequences for comprehension , 1983, Memory & cognition.

[4]  P. Roach,et al.  TECHNIQUES FOR THE PHONETIC DESCRIPTION OF EMOTIONAL SPEECH , 2000 .

[5]  Tharindu Cyril Weerasooriya,et al.  A method to extract essential keywords from a tweet using NLP tools , 2016, 2016 Sixteenth International Conference on Advances in ICT for Emerging Regions (ICTer).

[6]  Sid-Ahmed Selouani,et al.  Emotional speech recognition: A multilingual perspective , 2016, 2016 International Conference on Bio-engineering for Smart Technologies (BioSMART).

[7]  Dorothy M. Chun SIGNAL ANALYSIS SOFTWARE FOR TEACHING DISCOURSE INTONATION , 1998 .

[8]  Nick Cramer,et al.  Automatic Keyword Extraction from Individual Documents , 2010 .

[9]  Hsiao-Wuen Hon,et al.  An overview of the SPHINX speech recognition system , 1990, IEEE Trans. Acoust. Speech Signal Process..

[10]  M. Warren,et al.  A Corpus-driven Analysis of the Use of Intonation to Assert Dominance and Control , 2004 .

[11]  Vitaly Klyuev,et al.  On document evaluation for better context-aware summary generation , 2010, 2010 2nd International Symposium on Aware Computing.

[12]  Ghada Alharbi,et al.  Metadiscourse tagging in academic lectures , 2016 .

[13]  David Brazil,et al.  Discourse, Intonation and Language Teaching , 1981 .

[14]  Donald E. Hardy Textual Patterns: Key Words and Corpus Analysis in Language Education , 2007 .

[15]  Anssi Klapuri A Method for Visualizing the Pitch Content of Polyphonic Music Signals , 2009, ISMIR.

[16]  Michalis Vazirgiannis,et al.  Real-Time Keyword Extraction from Conversations , 2017, EACL.

[17]  Thomas S. Huang,et al.  A fast two-dimensional median filtering algorithm , 1979 .

[18]  Hideki Kawahara,et al.  YIN, a fundamental frequency estimator for speech and music. , 2002, The Journal of the Acoustical Society of America.

[19]  Brian Lott,et al.  Survey of Keyword Extraction Techniques , 2012 .

[20]  William I. Grosky,et al.  The Continuing Reinvention of Content-Based Retrieval: Multimedia Is Not Dead , 2017, IEEE Multim..