단어와 자소 기반 합성곱 신경망을 이용한 문서 분류/Text Classification based on Convolutional Neural Network with word and character level

Documents classification aims to analyze keywords or contextual meanings from a given document and classify them into specific categories. In order to successfully perform document classification, it is necessary to accurately extract the word information included in a given document. However, there are many variations of Korean words depending on the types of postposition, rooting and ending. In the case of online documents, these variations become even more severe. Considering the characteristics of these Korean documents, we propose a document classification method using both word and character information. By using character information, it is possible to consider information that was difficult to express by word set such as typos and emoticons in the document classification process. This model, which combines the features of the whole sentence obtained from the word information and the local features obtained from the character information, experimentally confirmed that it has higher classification performance than the existing models using only word information.