论文信息 - Learning Distributional Token Representations from Visual Features

Learning Distributional Token Representations from Visual Features

In this study, we compare token representations constructed from visual features (i.e., pixels) with standard lookup-based embeddings. Our goal is to gain insight about the challenges of encoding a text representation from low-level features, e.g. from characters or pixels. We focus on Chinese, which—as a logographic language—has properties that make a representation via visual features challenging and interesting. To train and evaluate different models for the token representation, we chose the task of character-based neural machine translation (NMT) from Chinese to English. We found that a token representation computed only from visual features can achieve competitive results to lookup embeddings. However, we also show different strengths and weaknesses in the models’ performance in a part-of- speech tagging task and also a semantic similarity task. In summary, we show that it is possible to achieve a text representation only from pixels. We hope that this is a useful stepping stone for future studies that exclusively rely on visual input, or aim at exploiting visual features of written language.

Samuel Broscheit | Samuel Broscheit

[1] Christopher D. Manning,et al. Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.

[2] Wei Chen,et al. Sogou Neural Machine Translation Systems for WMT17 , 2017, WMT.

[3] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[4] Alexander M. Rush,et al. Character-Aware Neural Language Models , 2015, AAAI.

[5] Alexander M. Rush,et al. Image-to-Markup Generation with Coarse-to-Fine Attention , 2016, ICML.

[6] Fei Xia,et al. The Penn Chinese TreeBank: Phrase structure annotation of a large corpus , 2005, Natural Language Engineering.

[7] Jeffrey Dean,et al. Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[8] Desmond Elliott,et al. Findings of the Second Shared Task on Multimodal Machine Translation and Multilingual Image Description , 2017, WMT.

[9] Marta R. Costa-jussà,et al. Chinese–Spanish neural machine translation enhanced with character and word bitmap fonts , 2017, Machine Translation.

[10] Yoram Singer,et al. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[11] Andrew McCallum,et al. Fast and Accurate Entity Recognition with Iterated Dilated Convolutions , 2017, EMNLP.