Twitter1 became popular owing to the devices like smartphones and tablets, with which short messages can be easily composed. Due to the popularity of Twitter, the volume of Twitter messages has increased rapidly. Accordingly, studies have been carried out to extract various data by analyzing Twitter messages. However, there is a limitation in mining accurate data from Twitter messages that are composed in short sentences because the misspelling problem is persisting. Although studies using Word2Vect are continuously conducted for spelling correction, it can be said that they are replacing the extracted words by using the Word2Vec rather than correcting the words. Furthermore, since characters of misspelled word are not taken into consideration, it does not fit the meaning of correction. This paper proposes a method of correcting misspelled words in Twitter messages by using an improved Word2Vec. Misspelled words in a Twitter message are selected through pre-processing process. For a selected misspelled word, candidate correction words are extracted through the improved Word2Vec. Among the extracted candidate words, the word that has the highest similarity value for the misspelled word replaces the misspelled word, thereby correcting the spelling error.
[1]
Jeffrey Dean,et al.
Efficient Estimation of Word Representations in Vector Space
,
2013,
ICLR.
[2]
Grzegorz Chrupala,et al.
Normalizing tweets with edit scripts and recurrent neural embeddings
,
2014,
ACL.
[3]
Shankar Kumar,et al.
Normalization of non-standard words
,
2001,
Comput. Speech Lang..
[4]
L. Venkata Subramaniam,et al.
Unsupervised cleansing of noisy text
,
2010,
COLING.
[5]
Jian Su,et al.
A Phrase-Based Statistical Model for SMS Text Normalization
,
2006,
ACL.
[6]
Timothy Baldwin,et al.
Lexical normalization for social media text
,
2013,
TIST.
[7]
Michael J. Fischer,et al.
The String-to-String Correction Problem
,
1974,
JACM.
[8]
Wang Ling,et al.
Paraphrasing 4 Microblog Normalization
,
2013,
EMNLP.
[9]
Eunji Lee,et al.
Correcting Misspelled Words in Twitter Text
,
2016,
BDTA.
[10]
Yang Liu,et al.
A Character-Level Machine Translation Approach for Normalization of SMS Abbreviations
,
2011,
IJCNLP.