论文信息 - PitchKeywordExtractor: Prosody-based automatic keyword extraction for speech content

PitchKeywordExtractor: Prosody-based automatic keyword extraction for speech content

Keyword extraction is widely used for information indexing, compressing, summarizing, etc. Existing keyword extraction techniques apply various text-based algorithms and metrics to locate the keywords. At the same time, some types of audio and audiovisual content, e. g. lectures, talks, interviews and other speech-oriented information, allow to perform keyword search by prosodic accents made by a speaker. This paper presents PitchKeywordExtractor — an algorithm with its software prototype for prosody-based automatic keyword extraction in speech content. It operates together with a third-party automatic speech recognition system, handles speech prosody by a pitch detection algorithm and locates the keywords using pitch contour cross-correlation with four tone units taken from D. Brazil discourse intonation model.

Evgeny Pyshkin | Natalia Bogach | Elena Boitsova | Yurij Lezhenin | Artyom Zhuikov

[1] Marc Leman,et al. TarsosDSP, a Real-Time Audio Processing Framework in Java , 2014, Semantic Audio.

[2] Arun Sahayadhas,et al. Keyword Extraction from Multiple Words for Report Recommendations in Media Wiki , 2017 .

[3] J. K. Bock,et al. Intonational marking of given and new information: Some consequences for comprehension , 1983, Memory & cognition.

[4] P. Roach,et al. TECHNIQUES FOR THE PHONETIC DESCRIPTION OF EMOTIONAL SPEECH , 2000 .

[5] Tharindu Cyril Weerasooriya,et al. A method to extract essential keywords from a tweet using NLP tools , 2016, 2016 Sixteenth International Conference on Advances in ICT for Emerging Regions (ICTer).

[6] Sid-Ahmed Selouani,et al. Emotional speech recognition: A multilingual perspective , 2016, 2016 International Conference on Bio-engineering for Smart Technologies (BioSMART).

[7] Dorothy M. Chun. SIGNAL ANALYSIS SOFTWARE FOR TEACHING DISCOURSE INTONATION , 1998 .

[8] Nick Cramer,et al. Automatic Keyword Extraction from Individual Documents , 2010 .

[9] Hsiao-Wuen Hon,et al. An overview of the SPHINX speech recognition system , 1990, IEEE Trans. Acoust. Speech Signal Process..

[10] M. Warren,et al. A Corpus-driven Analysis of the Use of Intonation to Assert Dominance and Control , 2004 .

[11] Vitaly Klyuev,et al. On document evaluation for better context-aware summary generation , 2010, 2010 2nd International Symposium on Aware Computing.

[12] Ghada Alharbi,et al. Metadiscourse tagging in academic lectures , 2016 .

[13] David Brazil,et al. Discourse, Intonation and Language Teaching , 1981 .

[14] Donald E. Hardy. Textual Patterns: Key Words and Corpus Analysis in Language Education , 2007 .

[15] Anssi Klapuri. A Method for Visualizing the Pitch Content of Polyphonic Music Signals , 2009, ISMIR.

[16] Michalis Vazirgiannis,et al. Real-Time Keyword Extraction from Conversations , 2017, EACL.

[17] Thomas S. Huang,et al. A fast two-dimensional median filtering algorithm , 1979 .

[18] Hideki Kawahara,et al. YIN, a fundamental frequency estimator for speech and music. , 2002, The Journal of the Acoustical Society of America.

[19] Brian Lott,et al. Survey of Keyword Extraction Techniques , 2012 .

[20] William I. Grosky,et al. The Continuing Reinvention of Content-Based Retrieval: Multimedia Is Not Dead , 2017, IEEE Multim..