A large-scale training corpus consisting of microblogs belonging to a desired category is important for high-accuracy microblog retrieval. Obtaining such a large-scale microblgging corpus manually is very time and labor-consuming. Therefore, some models for the automatic retrieval of microblogs from an exterior corpus have been proposed. However, these approaches may fail in considering microblog-specific features. To alleviate this issue, we propose a methodology that constructs a simulated microblog-ging corpus rather than directly building a model from the exterior corpus. The performance of our model is better since the microblog-special knowledge of the microblogging corpus is used in the end by the retrieval model. Experimental results on real-world microblogs demonstrate the superiority of our technique compared to the previous approaches.
[1]
David R. Karger,et al.
Tackling the Poor Assumptions of Naive Bayes Text Classifiers
,
2003,
ICML.
[2]
Ari Rappoport,et al.
Enhanced Sentiment Learning Using Twitter Hashtags and Smileys
,
2010,
COLING.
[3]
Jonathon Read,et al.
Using Emoticons to Reduce Dependency in Machine Learning Techniques for Sentiment Classification
,
2005,
ACL.
[4]
Ralph Grishman,et al.
Automatic Acquisition of Domain Knowledge for Information Extraction
,
2000,
COLING.
[5]
Wei Liang,et al.
Chinese Short Text Classification Based on Domain Knowledge
,
2013,
IJCNLP.
[6]
Hanan Samet,et al.
TwitterStand: news in tweets
,
2009,
GIS.