Vowel Phoneme Segmentation for Speaker Identification Using an ANN-Based Framework

Abstract Vowel phonemes are a part of any acoustic speech signal. Vowel sounds occur in speech more frequently and with higher energy. Therefore, vowel phoneme can be used to extract different amounts of speaker discriminative information in situations where acoustic information is noise corrupted. This article presents an approach to identify a speaker using the vowel sound segmented out from words spoken by the speaker. The work uses a combined self-organizing map (SOM)- and probabilistic neural network (PNN)-based approach to segment the vowel phoneme. The segmented vowel is later used to identify the speaker of the word by matching the patterns with a learning vector quantization (LVQ)-based code book. The LVQ code book is prepared by taking features of clean vowel phonemes uttered by the male and female speakers to be identified. The proposed work formulates a framework for the design of a speaker-recognition model of the Assamese language, which is spoken by ∼3 million people in the Northeast Indian state of Assam. The experimental results show that the segmentation success rates obtained using a SOM-based technique provides an increase of at least 7% compared with the discrete wavelet transform-based technique. This increase contributes to the improvement in overall performance of speaker identification by ∼3% compared with earlier related works.

[1]  Jorma Laaksonen,et al.  LVQ_PAK: The Learning Vector Quantization Program Package , 1996 .

[2]  Golockchandra Goswami,et al.  Structure of Assamese , 1982 .

[3]  Kjell Elenius,et al.  Multi-layer perceptrons and probabilistic neural networks for phoneme recognition , 1993, EUROSPEECH.

[4]  Nikos Fakotakis,et al.  A text-independent speaker recognition system based on vowel spotting , 1993, Speech Commun..

[5]  Olli Simula,et al.  A learning vector quantization algorithm for probabilistic models , 2000, 2000 10th European Signal Processing Conference.

[6]  K. K. Sarma,et al.  Segmentation of Assamese phonemes using SOM , 2012, 2012 3rd National Conference on Emerging Trends and Applications in Computer Science.

[7]  Anupam Shukla,et al.  Multi Lingual Character Recognition Using Hierarchical Rule Based Classification and Artificial Neural Network , 2009, ISNN.

[8]  Yan Zhang,et al.  Neural Network Ensemble Based on Vowel Classification for Chinese Speaker Recognition , 2007, Third International Conference on Natural Computation (ICNC 2007).

[9]  Donald F. Specht,et al.  Probabilistic neural networks , 1990, Neural Networks.

[10]  Mark Beale,et al.  Neural Network Toolbox™ User's Guide , 2015 .

[11]  Anupam Shukla,et al.  Text-Dependent Multilingual Speaker Identification for Indian Languages Using Artificial Neural Network , 2010, 2010 3rd International Conference on Emerging Trends in Engineering and Technology.

[12]  Teuvo Kohonen,et al.  The self-organizing map , 1990 .

[13]  S. R. Mahadeva Prasanna Gayadhar Pradhan Significance of Vowel Onset Point Information for Speaker Verification , 2011 .

[14]  M.G. Bellanger,et al.  Digital processing of speech signals , 1980, Proceedings of the IEEE.

[15]  Beng T. Tan,et al.  Applying wavelet analysis to speech segmentation and classification , 1994, Defense, Security, and Sensing.

[16]  S. R. Mahadeva Prasanna,et al.  Multivariability speaker recognition database in Indian scenario , 2012, Int. J. Speech Technol..

[17]  P Vivek,et al.  SPEAKER IDENTIFICATION USING A NONLINEAR SPEECH MODEL AND ANN , 2012 .

[18]  H. S. Jayanna,et al.  Limited data speaker identification , 2010 .

[19]  M. Alfaouri K-mean Clustering and Arabic Vowels Formants Based Speaker Identification System , 2010 .

[20]  S. R. Mahadeva Prasanna,et al.  Speaker verification using excitation source information , 2012, Int. J. Speech Technol..

[21]  K. K. Sarma,et al.  Formant frequency estimation of phonemes of Assamese speech , 2012, 2012 2nd National Conference on Computational Intelligence and Signal Processing (CISP).