论文信息 - Memory-aware i-vector extraction by means of sub-space factorization

Memory-aware i-vector extraction by means of sub-space factorization

Most of the state-of-the-art speaker recognition systems use i-vectors, a compact representation of spoken utterances. Since the “standard” i-vector extraction procedure requires large memory structures, we recently presented the Factorized Sub-space Estimation (FSE) approach, an efficient technique that dramatically reduces the memory needs for i-vector extraction, and is also fast and accurate compared to other proposed approaches. FSE is based on the approximation of the matrix T, representing the speaker variability sub-space, by means of the product of appropriately designed matrices. In this work, we introduce and evaluate a further approximation of the matrices that most contribute to the memory costs in the FSE approach, showing that it is possible to obtain comparable system accuracy using less than a half of FSE memory, which corresponds to more than 60 times memory reduction with respect to the standard method of i-vector extraction.

Pietro Laface | Sandro Cumani

[1] J. J. Modi,et al. An alternative givens ordering , 1984 .

[2] Rolf Dieter Grigorieff,et al. A Note on von Neumann's Trace Inequalitv , 1991 .

[3] Pietro Laface,et al. Probabilistic linear discriminant analysis of i-vector posterior distributions , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[4] Bin Ma,et al. Phonetically-constrained PLDA modeling for text-dependent speaker verification with multiple short utterances , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[5] Hagai Aronowitz,et al. Text dependent speaker verification using a small development set , 2012, Odyssey.

[6] John H. L. Hansen,et al. Acoustic Factor Analysis for Robust Speaker Verification , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[7] Bengt J. Borgstrom,et al. Discriminatively trained Bayesian speaker comparison of i-vectors , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[8] James H. Elder,et al. Probabilistic Linear Discriminant Analysis for Inferences About Identity , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[9] L. Mirsky. A trace inequality of John von Neumann , 1975 .

[10] W. Gentleman. Error analysis of QR decompositions by Givens transformations , 1975 .

[11] The NIST Year 2010 Speaker Recognition Evaluation Plan 1 I NTRODUCTION , 2022 .

[12] D. C. Youla,et al. A Normal form for a Matrix under the Unitary Congruence Group , 1961, Canadian Journal of Mathematics.

[13] Patrick Kenny,et al. Mixture of PLDA Models in i-vector Space for Gender-Independent Speaker Recognition , 2011, INTERSPEECH.

[14] Douglas A. Reynolds,et al. Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[15] Pietro Laface,et al. Memory and computation effective approaches for i - vector extraction , 2012, Odyssey.

[16] Haizhou Li,et al. I-vectors in the context of phonetically-constrained short utterances for speaker verification , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[17] David J. Kuck,et al. On Stable Parallel Linear System Solvers , 1978, JACM.

[18] Andreas Stolcke,et al. Within-class covariance normalization for SVM-based speaker recognition , 2006, INTERSPEECH.

[19] Patrick Kenny,et al. Front-End Factor Analysis for Speaker Verification , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[20] Niko Brümmer,et al. The speaker partitioning problem , 2010, Odyssey.

[21] Lukás Burget,et al. Simplification and optimization of i-vector extraction , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[22] Themos Stafylakis,et al. PLDA for speaker verification with utterances of arbitrary duration , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[23] Bin Ma,et al. Sparse Classifier Fusion for Speaker Verification , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[24] Patrick Kenny,et al. Bayesian Speaker Verification with Heavy-Tailed Priors , 2010, Odyssey.

[25] Lukás Burget,et al. Full-covariance UBM and heavy-tailed PLDA in i-vector speaker verification , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[26] Patrick Kenny. A small footprint i-vector extractor , 2012, Odyssey.

[27] Pietro Laface,et al. On the use of i–vector posterior distributions in Probabilistic Linear Discriminant Analysis , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[28] Pietro Laface,et al. Memory and Computation Trade-Offs for Efficient I-Vector Extraction , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[29] Oren Barkan,et al. Efficient approximated i-vector extraction , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[30] Balaji Vasan Srinivasan,et al. A Symmetric Kernel Partial Least Squares Framework for Speaker Recognition , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[31] Pietro Laface,et al. Factorized Sub-Space Estimation for Fast and Memory Effective I-vector Extraction , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[32] Niko Brümmer,et al. Towards Fully Bayesian Speaker Recognition: Integrating Out the Between-Speaker Covariance , 2011, INTERSPEECH.