Phonotactic Model for Spoken Language Identification in Indian Language Perspective

Indian Languages are Indo-Aryan being influenced by Sanskrit or Dravidian being influenced by Tamil. Dravidian Languages have the influence of Sanskrit also. All Indian Languages have the influence of Pali language for which the graphemes are being influenced Brahmi. All the Indian languages are phonetic in nature. Every Indian language has its distinctive phone sets. North Indian languages are IndoAryan and South Indian Languages are Dravidian. Considering their respective Phonetic properties during speaking we have tried to consider the special CV behaviour of the language in their syllables and are able to identify the Language analysing it with the limited training data set available using the SVM Classifier. During this process we have analysed the PPR Language Modelling concept for four major Indian languages like Hindi, Bengali, Oriya, and Telugu and the results are quite appreciable. General Terms Spoken Language Identification, Speech Processing, Support Vector Machine

[1]  Tanja Schultz,et al.  Language independent and language adaptive large vocabulary speech recognition , 1998, ICSLP.

[2]  David G. Stork,et al.  Pattern Classification , 1973 .

[3]  Vladimir Cherkassky,et al.  The Nature Of Statistical Learning Theory , 1997, IEEE Trans. Neural Networks.

[4]  Jirí Navrátil,et al.  Spoken language recognition-a step toward multilinguality in speech processing , 2001, IEEE Trans. Speech Audio Process..

[5]  Keith Worden,et al.  Damage identification using support vector machines , 2001 .

[6]  Samy Bengio,et al.  Automatic Speech and Speaker Recognition: Large Margin and Kernel Methods , 2009 .

[7]  Coarticulation • Suprasegmentals,et al.  Acoustic Phonetics , 2019, The SAGE Encyclopedia of Human Communication Sciences and Disorders.

[8]  Alex Acero,et al.  Spoken Language Processing , 2001 .

[9]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[10]  Mahesan Niranjan,et al.  Data-dependent kernels in svm classification of speech patterns , 2000, INTERSPEECH.

[11]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[12]  M.G. Bellanger,et al.  Digital processing of speech signals , 1980, Proceedings of the IEEE.

[13]  Bin Ma,et al.  Multilingual speech recognition with language identification , 2002, INTERSPEECH.

[14]  G. B. Varile Multilingual Speech Processing , 2005 .

[15]  Doroteo Torre Toledano,et al.  Exploring PPRLM performance for NIST 2005 Language Recognition Evaluation , 2006, 2006 IEEE Odyssey - The Speaker and Language Recognition Workshop.

[16]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[17]  William M. Campbell,et al.  Support vector machines for speaker and language recognition , 2006, Comput. Speech Lang..

[18]  W. Marsden I and J , 2012 .

[19]  Pavel Matejka,et al.  Phonotactic language identification using high quality phoneme recognition , 2005, INTERSPEECH.

[20]  Frederick Jelinek,et al.  Statistical methods for speech recognition , 1997 .

[21]  Marc A. Zissman,et al.  Comparison of : Four Approaches to Automatic Language Identification of Telephone Speech , 2004 .

[22]  Y.K. Muthusamy,et al.  Reviewing automatic language identification , 1994, IEEE Signal Processing Magazine.