WeiboFinder: A Topic-Based Chinese Word Finding and Learning System

With the explosive growth of user-generated data in social media websites such as Twitter and Weibo, a lot of research has been conducted on using user-generated data for web-based learning. Finding users’ desired data in an effective way is critical for language learners. Social media websites provide diversified data for language learners and some new words such as cyberspeak could only be learned in these online communities. In this paper, we present a system called WeiboFinder to suggest topic-based words and documents related to a target word for Chinese learners. All the words and documents are from the Chinese social media website: Weibo. Weibo is one of the largest microblog social meida websites in China which has similar functions as Twitter. The experimental results show that the proposed method is effective and better than other methods. The topics from our method are more interpretable and topic-based words are useful for Chinese learners.

[1]  J. Keziya Rani,et al.  Mining Opinion Features in Customer Reviews. , 2016 .

[2]  Raymond Y. K. Lau,et al.  Generating Incidental Word-Learning Tasks via Topic-Based and Load-Based Profiles , 2016, IEEE MultiMedia.

[3]  Arjun Mukherjee,et al.  Aspect Extraction through Semi-Supervised Modeling , 2012, ACL.

[4]  Rui Fan,et al.  Anger Is More Influential than Joy: Sentiment Correlation in Weibo , 2013, PloS one.

[5]  Raymond Y. K. Lau,et al.  Exploring Topic Discriminating Power of Words in Latent Dirichlet Allocation , 2016, COLING.

[6]  Abdellatif El Afia,et al.  Social Media: An Optimal Virtual Environment for Learning Foreign Languages , 2014, Int. J. Emerg. Technol. Learn..

[7]  James Coady,et al.  INCIDENTAL VOCABULARY ACQUISITION IN A SECOND LANGUAGE , 1999, Studies in Second Language Acquisition.

[8]  Alice H. Oh,et al.  Aspect and sentiment unification model for online review analysis , 2011, WSDM '11.

[9]  Gilad Mishne,et al.  Finding high-quality content in social media , 2008, WSDM '08.

[10]  Xin Li,et al.  ZhihuRank: A Topic-Sensitive Expert Finding Algorithm in Community Question Answering Websites , 2015, ICWL.

[11]  Katerina Zourou,et al.  On the attractiveness of social media for language learning: a look at the state of the art , 2012 .

[12]  Khurshid Ahmad,et al.  Multi-lingual Sentiment Analysis of Financial News Streams , 2007 .

[13]  Andrew McCallum,et al.  Chinese Segmentation and New Word Detection using Conditional Random Fields , 2004, COLING.

[14]  Andrew McCallum,et al.  Optimizing Semantic Coherence in Topic Models , 2011, EMNLP.

[15]  Flora S. Tsai,et al.  Detecting novel business blogs , 2009, 2009 7th International Conference on Information, Communications and Signal Processing (ICICS).

[16]  Tao Wang,et al.  Entropy-Based Term Weighting Schemes for Text Categorization in VSM , 2015, 2015 IEEE 27th International Conference on Tools with Artificial Intelligence (ICTAI).

[17]  Arjun Mukherjee,et al.  Discovering coherent topics using general knowledge , 2013, CIKM.

[18]  Yaxin Bi,et al.  Extended Twofold-LDA Model for Two Aspects in One Sentence , 2012, IPMU.

[19]  Abdellatif El Afia,et al.  Exploring the Potential Benefits of Using Social Media in Education , 2013, Int. J. Eng. Pedagog..

[20]  Peter A. Chew,et al.  Term Weighting Schemes for Latent Dirichlet Allocation , 2010, NAACL.

[21]  Xiaojin Zhu,et al.  Incorporating domain knowledge into topic modeling via Dirichlet Forest priors , 2009, ICML '09.

[22]  Yong Yu,et al.  A comparative study of users' microblogging behavior on sina weibo and twitter , 2012, UMAP.

[23]  Haoran Xie,et al.  Predicting Pre-knowledge on Vocabulary from e-Learning Assignments for Language Learners , 2015, ICWL Workshops.

[24]  Fan Yang,et al.  Automatic detection of rumor on Sina Weibo , 2012, MDS '12.

[25]  Mark J. W. Lee,et al.  Teaching and learning in the Web 2.0 era: empowering students through learner-generated content , 2007 .