论文信息 - Enhanced Double-Carrier Word Embedding via Phonetics and Writing

Enhanced Double-Carrier Word Embedding via Phonetics and Writing

Word embeddings, which map words into a unified vector space, capture rich semantic information. From a linguistic point of view, words have two carriers, speech and writing. Yet the most recent word embedding models focus on only the writing carrier and ignore the role of the speech carrier in semantic expressions. However, in the development of language, speech appears before writing and plays an important role in the development of writing. For phonetic language systems, the written forms are secondary symbols of spoken ones. Based on this idea, we carried out our work and proposed double-carrier word embedding (DCWE). We used DCWE to conduct a simulation of the generation order of speech and writing. We trained written embedding based on phonetic embedding. The final word embedding fuses writing and phonetic embedding. To illustrate that our model can be applied to most languages, we selected Chinese, English, and Spanish as examples and evaluated these models through word similarity and text classification experiments.

[1] Chih-Jen Lin,et al. LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[2] Karen Livescu,et al. Multi-view Recurrent Neural Acoustic Word Embeddings , 2016, ICLR.

[3] Huanhuan Chen,et al. Improve Chinese Word Embeddings by Exploiting Internal Structure , 2016, NAACL.

[4] Jeffrey Pennington,et al. GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[5] Tomas Mikolov,et al. Enriching Word Vectors with Subword Information , 2016, TACL.

[6] Karen Livescu,et al. Deep convolutional acoustic word embeddings using word-pair side information , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[7] Aren Jansen,et al. Segmental acoustic indexing for zero resource keyword search , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[8] Yong Yu,et al. Learning Word Representation Considering Proximity and Ambiguity , 2014, AAAI.

[9] Jiajun Zhang,et al. Associative Multichannel Autoencoder for Multimodal Word Representation , 2018, EMNLP.

[10] Hao Xin,et al. Joint Embeddings of Chinese Words, Characters, and Fine-grained Subcharacter Components , 2017, EMNLP.

[11] Gary Geunbae Lee,et al. Neural sentence embedding using only in-domain sentences for out-of-domain sentence detection in dialog systems , 2017, Pattern Recognit. Lett..

[12] Jeffrey Dean,et al. Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[13] Xiang Zhang,et al. Character-level Convolutional Networks for Text Classification , 2015, NIPS.

[14] Omer Levy,et al. Dependency-Based Word Embeddings , 2014, ACL.

[15] Yoshua Bengio,et al. Neural Probabilistic Language Models , 2006 .

[16] Aren Jansen,et al. Unsupervised Learning of Semantic Audio Representations , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[17] Jun Zhou,et al. cw2vec: Learning Chinese Word Embeddings with Stroke n-gram Information , 2018, AAAI.

[18] Wei Lu,et al. Improving Word Embeddings with Convolutional Feature Learning and Subword Information , 2017, AAAI.

[19] Geoffrey E. Hinton,et al. A Scalable Hierarchical Distributed Language Model , 2008, NIPS.

[20] Jeffrey Dean,et al. Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[21] Peng Jin,et al. SemEval-2012 Task 4: Evaluating Chinese Word Similarity , 2012, SemEval@NAACL-HLT.

[22] F. Saussure,et al. Course in General Linguistics , 1960 .

[23] Ashwin K. Vijayakumar,et al. Sound-Word2Vec: Learning Word Representations Grounded in Sounds , 2017, EMNLP.

[24] Yang Xu,et al. Implicitly Incorporating Morphological Information into Word Embedding , 2017, ArXiv.

[25] Edward Sapir,et al. Language: An Introduction to the Study of Speech , 1955 .

[26] Stephen Clark,et al. Multi- and Cross-Modal Semantics Beyond Vision: Grounding in Auditory Perception , 2015, EMNLP.

[27] John B. Goodenough,et al. Contextual correlates of synonymy , 1965, CACM.

[28] Georg Heigold,et al. Word embeddings for speech recognition , 2014, INTERSPEECH.