Investigating speaker authentication system vulnerability to the limited duration of speech excerpts and voice cloning

The impact of the length of the reference sample and the authentication sample to the accuracy of the speaker authentication employing deep learning architecture is tested in bank branches and discussed in the paper. The presented work focuses on testing different approaches to parameterizing voice credentials employing: MFCC, LPC, and GFCC as extracted features. Also, a mixed approach with the use of supervector containing the most important coefficients for each parameterization method was examined. For the purpose of this work, standard corpora for the authentication of speakers like VoxCeleb2 and Librispeech were used along with our own recordings. Another subject of the work was to investigate the immunity of the speaker verification system based on machine learning to attack attempts using the method of voice cloning. The impact of the duration of speech excerpts on the vulnerability to this type of attack was examined. The influence of quality and length of the generated recordings used for the attack was studied. It turned out that the results obtained depend on the acoustic conditions in bank branches, where there is quite a lot of noise coming from the work of banknote counters, the clatter of stamping documents, and conversations. [Project No. POIR.01.01.01-0092/19 entitled: “BIOPUAP—a biometric cloud authentication system” is currently financed by the Polish National Centre for Research and Development (NCBR) from the European Regional Development Fund.]