SMS-2008:An Annotated Chinese Short Messages Corpus

With the popularity of short messages,smart SMS tools are urgently demanded by users,operators and government departments.However,there is no open standard SMS corpus,which is an indispensable resource for the algorithm research,system development and performance test etc,due to the technology,the copyright protection, the privacy right and other various reasons.SMS-2008,as an annotated Chinese SMS Corpus,takes the lead in establishing a multi-purpose Chinese text message corpus,which includes the original corpus,privacy tagged corpus, content tagged corpus,errors tagged corpus.This Corpus can be applied in the research of SMS language, SMS classification,privacy protection algorithm or automatically correcting system.