Multimodal neural pronunciation modeling for spoken languages with logographic origin

Graphemes of most languages encode pronunciation, though some are more explicit than others. Languages like Spanish have a straightforward mapping between its graphemes and phonemes, while this mapping is more convoluted for languages like English. Spoken languages such as Cantonese present even more challenges in pronunciation modeling: (1) they do not have a standard written form, (2) the closest graphemic origins are logographic Han characters, of which only a subset of these logographic characters implicitly encodes pronunciation. In this work, we propose a multimodal approach to predict the pronunciation of Cantonese logographic characters, using neural networks with a geometric representation of logographs and pronunciation of cognates in historically related languages. The proposed framework improves performance by 18.1% and 25.0% respective to unimodal and multimodal baselines.

[1]  Hao Xin,et al.  Joint Embeddings of Chinese Words, Characters, and Fine-grained Subcharacter Components , 2017, EMNLP.

[2]  Timothy Baldwin,et al.  Sub-character Neural Language Modelling in Japanese , 2017, SWCN@EMNLP.

[3]  Makoto Miwa,et al.  Utilizing Visual Forms of Japanese Characters for Neural Review Classification , 2017, IJCNLP.

[4]  Mantaro J. Hashimoto Current Developments in Sino-Vietnamese Studies. , 1978 .

[5]  R. Treiman,et al.  Syllable Structure and the Distribution of Phonemes in English Syllables , 1997 .

[6]  Frederick Liu,et al.  Learning Character-level Compositionality with Visual Features , 2017, ACL.

[7]  Erik Cambria,et al.  Radical-Based Hierarchical Embeddings for Chinese Sentiment Analysis at Sentence Level , 2017, FLAIRS.

[8]  Rui Li,et al.  Multi-Granularity Chinese Word Embedding , 2016, EMNLP.

[9]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[10]  Leo Loveday,et al.  Language Contact in Japan: A Sociolinguistic History , 1998 .

[11]  Haizhou Li,et al.  Grapheme-to-phoneme conversion for Chinese text-to-speech , 2004, INTERSPEECH.

[12]  V. V. Heuven,et al.  Mutual intelligibility of Chinese dialects experimentally tested , 2009 .

[13]  Masanori Hattori,et al.  Character-Based LSTM-CRF with Radical-Level Features for Chinese Named Entity Recognition , 2016, NLPCC/ICCPOL.

[14]  Zoubin Ghahramani,et al.  A Theoretically Grounded Application of Dropout in Recurrent Neural Networks , 2015, NIPS.

[15]  Chao Liu,et al.  Radical Embedding: Delving Deeper to Chinese Radicals , 2015, ACL.

[16]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Masafumi Hagiwara,et al.  Radical-level Ideograph Encoder for RNN-based Sentiment Analysis of Chinese and Japanese , 2017, ACML.

[18]  Falcon Z. Dai,et al.  Glyph-aware Embedding of Chinese Characters , 2017, SWCN@EMNLP.

[19]  Ho-min Sohn The Korean language , 1999 .

[20]  Holly P. Branigan,et al.  Lexical and syntactic representations in closely related languages: Evidence from Cantonese–Mandarin bilinguals , 2011 .

[21]  Lei Wu,et al.  Dual Long Short-Term Memory Networks for Sub-Character Representation Learning , 2017, ArXiv.

[22]  Sanjeev Khudanpur,et al.  Acoustic data-driven pronunciation lexicon generation for logographic languages , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[23]  Zev Handel The Classification of Chinese , 2015 .

[24]  Leo Loveday,et al.  Language Contact in Japan: A Socio-Linguistic History , 1996 .

[25]  Janet Hui-wen Hsiao,et al.  Analysis of a Chinese Phonetic Compound Database: Implications for Orthographic Processing , 2006, Journal of psycholinguistic research.

[26]  Mark J. Alves What ’ s so Chinese about Vietnamese ? , 2014 .

[27]  Xuehai Zhou,et al.  Natural Language Processing Service Based on Stroke-Level Convolutional Networks for Chinese Text Classification , 2017, 2017 IEEE International Conference on Web Services (ICWS).

[28]  Christopher D. Manning,et al.  Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks , 2015, ACL.

[29]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[30]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[31]  John Defrancis,et al.  Graphemic indeterminacy in writing systems , 1996 .