SpeechPy - A Library for Speech Processing and Recognition

SpeechPy is an open source Python package that contains speech preprocessing techniques, speech features, and important post-processing operations. It provides most frequent used speech features including MFCCs and filterbank energies alongside with the log-energy of filter-banks. The aim of the package is to provide researchers with a simple tool for speech feature extraction and processing purposes in applications such as Automatic Speech Recognition and Speaker Verification.

[1]  Nasser M. Nasrabadi,et al.  Text-Independent Speaker Verification Using 3D Convolutional Neural Networks , 2017, 2018 IEEE International Conference on Multimedia and Expo (ICME).

[2]  Lutz Prechelt,et al.  An empirical comparison of C, C++, Java, Perl, Python, Rexx, and Tcl for a search/string-processing program , 2000 .

[3]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[4]  Dong Yu,et al.  Improved Bottleneck Features Using Pretrained Deep Neural Networks , 2011, INTERSPEECH.

[5]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[6]  Geoffrey Zweig,et al.  Recent advances in deep learning for speech research at Microsoft , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[7]  David Pearce,et al.  The aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions , 2000, INTERSPEECH.

[8]  Sadaoki Furui,et al.  Speaker-independent isolated word recognition using dynamic features of speech spectrum , 1986, IEEE Trans. Acoust. Speech Signal Process..

[9]  Masoud Nikravesh,et al.  Feature Extraction - Foundations and Applications , 2006, Feature Extraction.

[10]  Ya Zhang,et al.  Deep feature for text-dependent speaker verification , 2015, Speech Commun..

[11]  Theodoros Giannakopoulos pyAudioAnalysis: An Open-Source Python Library for Audio Signal Analysis , 2015, PloS one.

[12]  John W. Merrill,et al.  Automatic Speech Recognition , 2005 .

[13]  Jean-Philippe Thiran,et al.  Information Theoretic Feature Extraction for Audio-Visual Speech Recognition , 2009, IEEE Transactions on Signal Processing.

[14]  Tara N. Sainath,et al.  Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups , 2012, IEEE Signal Processing Magazine.

[15]  Vahid Tabataba Vakili,et al.  On the construction of polar codes for achieving the capacity of marginal channels , 2017, 2017 55th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[16]  Jr. J.P. Campbell,et al.  Speaker recognition: a tutorial , 1997, Proc. IEEE.

[17]  Amirsina Torfi,et al.  Attention-Based Guided Structured Sparsity of Deep Neural Networks , 2018, ArXiv.

[18]  Honglak Lee,et al.  Unsupervised feature learning for audio classification using convolutional deep belief networks , 2009, NIPS.

[19]  Amirsina Torfi,et al.  3D Convolutional Neural Networks for Cross Audio-Visual Matching Recognition , 2017, IEEE Access.

[20]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[21]  Erik McDermott,et al.  Deep neural networks for small footprint text-dependent speaker verification , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[22]  Claude E. Shannon,et al.  The Mathematical Theory of Communication , 1950 .