Word2Vec based spelling correction method of Twitter message

Twitter1 became popular owing to the devices like smartphones and tablets, with which short messages can be easily composed. Due to the popularity of Twitter, the volume of Twitter messages has increased rapidly. Accordingly, studies have been carried out to extract various data by analyzing Twitter messages. However, there is a limitation in mining accurate data from Twitter messages that are composed in short sentences because the misspelling problem is persisting. Although studies using Word2Vect are continuously conducted for spelling correction, it can be said that they are replacing the extracted words by using the Word2Vec rather than correcting the words. Furthermore, since characters of misspelled word are not taken into consideration, it does not fit the meaning of correction. This paper proposes a method of correcting misspelled words in Twitter messages by using an improved Word2Vec. Misspelled words in a Twitter message are selected through pre-processing process. For a selected misspelled word, candidate correction words are extracted through the improved Word2Vec. Among the extracted candidate words, the word that has the highest similarity value for the misspelled word replaces the misspelled word, thereby correcting the spelling error.