Applying Functional Partition in the Investigation of Lexical Tonal-Pattern Categories in an Under-Resourced Chinese Dialect

The present study applied functional partition to investigate disyllabic lexical tonal-pattern categories in an under-resourced Chinese dialect, Jinan Mandarin. A Two-Stage partitioning procedure was introduced to process a multi-speaker corpus that contains irregular lexical variants in a semi-automatic way. In the first stage, a program provides suggestions for the phonetician to decide the lexical tonal variants for the recordings of each word, based on the result of a functional k-means partitioning algorithm and tonal information from an available pronunciation dictionary of a related Chinese dialect, i.e. Standard Chinese. The second stage iterates a functional version of k-means partitioning with Silhouette-based criteria to abstract an optimal number of tonal patterns from the whole corpus, which also allows the phoneticians to adjust the results of the automatic procedure in a controlled way and so redo partitioning for a subset of clusters. The procedure yielded eleven disyllabic tonal patterns for Jinan Mandarin, representing the tonal system used by contemporary Jinan Mandarin speakers from a wide range of age groups. The procedure used in this paper is different from previous linguistic descriptions, which were based on more elderly speakers’ pronunciations. This method incorporates phoneticians’ linguistic knowledge and preliminary linguistic resources into the procedure of partitioning. It can improve the efficiency and objectivity in the investigation of lexical tonal-pattern categories when building pronunciation dictionaries for under-resourced languages.

[1]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[2]  Tanja Schultz,et al.  Automatic speech recognition for under-resourced languages: A survey , 2014, Speech Commun..

[3]  Manuel Febrero-Bande,et al.  Statistical Computing in Functional Data Analysis: The R Package fda.usc , 2012 .

[4]  Vladimir Estivill-Castro,et al.  Why so many clustering algorithms: a position paper , 2002, SKDD.

[5]  Hans-Peter Kriegel,et al.  LOF: identifying density-based local outliers , 2000, SIGMOD '00.

[6]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[7]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[8]  P Iverson,et al.  Mapping the perceptual magnet effect for speech using signal detection theory and multidimensional scaling. , 1995, The Journal of the Acoustical Society of America.

[9]  Paul Boersma,et al.  Praat, a system for doing phonetics by computer , 2002 .

[10]  Federico Rotolo,et al.  parfm: Parametric Frailty Models in R , 2012 .

[11]  Yiya Chen,et al.  How does phonology guide phonetics in segment-f0 interaction? , 2011, J. Phonetics.

[12]  Paul Iverson,et al.  Tests of the perceptual magnet effect for American English /r/ and /l/ , 1994 .

[13]  B. Lobanov Classification of Russian Vowels Spoken by Different Speakers , 1971 .

[14]  R. Fraiman,et al.  Trimmed means for functional data , 2001 .

[15]  Yiya Chen,et al.  Tonal variability in lexical access , 2014 .