pyAudioAnalysis: An Open-Source Python Library for Audio Signal Analysis

Audio information plays a rather important role in the increasing digital content that is available today, resulting in a need for methodologies that automatically analyze such content: audio event recognition for home automations and surveillance systems, speech recognition, music information retrieval, multimodal analysis (e.g. audio-visual analysis of online videos for content-based recommendation), etc. This paper presents pyAudioAnalysis, an open-source Python library that provides a wide range of audio analysis procedures including: feature extraction, classification of audio signals, supervised and unsupervised segmentation and content visualization. pyAudioAnalysis is licensed under the Apache License and is available at GitHub (https://github.com/tyiannak/pyAudioAnalysis/). Here we present the theoretical background behind the wide range of the implemented methodologies, along with evaluation metrics for some of the methods. pyAudioAnalysis has been already used in several audio analysis research applications: smart-home functionalities through audio event detection, speech emotion recognition, depression classification based on audio-visual features, music segmentation, multimodal content-based movie recommendation and health applications (e.g. monitoring eating habits). The feedback provided from all these particular audio applications has led to practical enhancement of the library.

[1]  Alex Acero,et al.  HMM-based smoothing for concatenative speech synthesis , 1998, ICSLP.

[2]  Sergios Theodoridis,et al.  Music meter and tempo tracking from raw polyphonic audio , 2004, ISMIR.

[3]  Thomas Sikora,et al.  MPEG-7 Audio and Beyond: Audio Content Indexing and Retrieval , 2005 .

[4]  Gregory H. Wakefield,et al.  Audio thumbnailing of popular music using chroma-based representations , 2005, IEEE Transactions on Multimedia.

[5]  George Tzanetakis,et al.  An experimental comparison of audio tempo induction algorithms , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[6]  Douglas A. Reynolds,et al.  An overview of automatic speaker diarization systems , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[7]  Sergios Theodoridis,et al.  A Speech/Music Discriminator of Radio Recordings Based on Dynamic Programming and Bayesian Networks , 2008, IEEE Transactions on Multimedia.

[8]  Sergios Theodoridis,et al.  Pattern Recognition, Fourth Edition , 2008 .

[9]  Alessandro Vinciarelli,et al.  Canal9: A database of political debates for analysis of social interactions , 2009, 2009 3rd International Conference on Affective Computing and Intelligent Interaction and Workshops.

[10]  Ricardo J. G. B. Campello,et al.  On the Comparison of Relative Clustering Validity Criteria , 2009, SDM.

[11]  Theodoros Giannakopoulos,et al.  Fisher Linear Semi-Discriminant Analysis for Speaker Diarization , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[12]  Nicholas W. D. Evans,et al.  Speaker Diarization: A Review of Recent Research , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[13]  Giorgos Siantikos,et al.  Discovering Similarities for Content-Based Recommendation and Browsing in Multimedia Collections , 2014, 2014 Tenth International Conference on Signal-Image Technology and Internet-Based Systems.

[14]  Stavros J. Perantonis,et al.  Realtime depression estimation using mid-term audio features , 2014, AI-AM/NetMed@ECAI.

[15]  Stavros J. Perantonis,et al.  Automatic soundscape quality estimation using audio analysis , 2015, PETRA.

[16]  Fillia Makedon,et al.  A multimodal adaptive dialogue manager for depressive and anxiety disorder screening: a Wizard-of-Oz experiment , 2015, PETRA.