Better Word Embeddings for Korean

Vector representations of words that capture semantic and syntactic information accurately is critical for the performance of models that use these vectors as inputs. Algorithms that only use the surrounding context at the word level ignore the subword level relationships which carry important meaning especially for languages that are highly inflected such as Korean. In this paper we compare the word vectors generated by incorporating different levels of subword information, through visualization using t-SNE, for small sized Korean data.