论文信息 - SpeechPy - A Library for Speech Processing and Recognition

SpeechPy - A Library for Speech Processing and Recognition

SpeechPy is an open source Python package that contains speech preprocessing techniques, speech features, and important post-processing operations. It provides most frequent used speech features including MFCCs and filterbank energies alongside with the log-energy of filter-banks. The aim of the package is to provide researchers with a simple tool for speech feature extraction and processing purposes in applications such as Automatic Speech Recognition and Speaker Verification.

Amirsina Torfi | A. Torfi

[1] Nasser M. Nasrabadi,et al. Text-Independent Speaker Verification Using 3D Convolutional Neural Networks , 2017, 2018 IEEE International Conference on Multimedia and Expo (ICME).

[2] Lutz Prechelt,et al. An empirical comparison of C, C++, Java, Perl, Python, Rexx, and Tcl for a search/string-processing program , 2000 .

[3] Biing-Hwang Juang,et al. Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[4] Dong Yu,et al. Improved Bottleneck Features Using Pretrained Deep Neural Networks , 2011, INTERSPEECH.

[5] Guigang Zhang,et al. Deep Learning , 2016, Int. J. Semantic Comput..

[6] Geoffrey Zweig,et al. Recent advances in deep learning for speech research at Microsoft , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[7] David Pearce,et al. The aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions , 2000, INTERSPEECH.

[8] Sadaoki Furui,et al. Speaker-independent isolated word recognition using dynamic features of speech spectrum , 1986, IEEE Trans. Acoust. Speech Signal Process..

[9] Masoud Nikravesh,et al. Feature Extraction - Foundations and Applications , 2006, Feature Extraction.

[10] Ya Zhang,et al. Deep feature for text-dependent speaker verification , 2015, Speech Commun..

[11] Theodoros Giannakopoulos. pyAudioAnalysis: An Open-Source Python Library for Audio Signal Analysis , 2015, PloS one.

[12] John W. Merrill,et al. Automatic Speech Recognition , 2005 .

[13] Jean-Philippe Thiran,et al. Information Theoretic Feature Extraction for Audio-Visual Speech Recognition , 2009, IEEE Transactions on Signal Processing.

[14] Tara N. Sainath,et al. Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups , 2012, IEEE Signal Processing Magazine.

[15] Vahid Tabataba Vakili,et al. On the construction of polar codes for achieving the capacity of marginal channels , 2017, 2017 55th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[16] Jr. J.P. Campbell,et al. Speaker recognition: a tutorial , 1997, Proc. IEEE.

[17] Amirsina Torfi,et al. Attention-Based Guided Structured Sparsity of Deep Neural Networks , 2018, ArXiv.

[18] Honglak Lee,et al. Unsupervised feature learning for audio classification using convolutional deep belief networks , 2009, NIPS.

[19] Amirsina Torfi,et al. 3D Convolutional Neural Networks for Cross Audio-Visual Matching Recognition , 2017, IEEE Access.

[20] Sang Joon Kim,et al. A Mathematical Theory of Communication , 2006 .

[21] Erik McDermott,et al. Deep neural networks for small footprint text-dependent speaker verification , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[22] Claude E. Shannon,et al. The Mathematical Theory of Communication , 1950 .