Ensemble of Feature Sets and Classification Methods for Stance Detection

Stance detection is the task of automatically determining the author’s favorability towards a given target. However, the target may not be explicitly mentioned in the text and even someone may refer some positive opinions to against the target, which make the task more difficult. In this paper, we describe an ensemble framework which integrates various feature sets and classification methods, and does not consist any handcrafted templates or rules to help stance detection. We submit our solution to NLPCC 2016 shared task: Detecting Stance in Chinese Weibo (Task A), which is a supervised task towards five targets. The official results show that our solution of the team “CBrain” achieves one 1st place and one 2nd place on these targets, and the overall ranking is 4th out of 16 teams. Our code is available at https://github.com/jacoxu/2016NLPCC_Stance_Detection.

[1]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[2]  Min Wu,et al.  Repeat Buyer Prediction for E-Commerce , 2016, KDD.

[3]  Yiming Yang,et al.  A Comparative Study on Feature Selection in Text Categorization , 1997, ICML.

[4]  Rui Xia,et al.  Ensemble of feature sets and classification algorithms for sentiment classification , 2011, Inf. Sci..

[5]  Mikhail Belkin,et al.  Laplacian Eigenmaps for Dimensionality Reduction and Data Representation , 2003, Neural Computation.

[6]  Bing Liu,et al.  Mining and summarizing customer reviews , 2004, KDD.

[7]  Mengen Chen,et al.  Short Text Classification Improved by Learning Multi-Granularity Topics , 2011, IJCAI.

[8]  Maosong Sun,et al.  Experimental Study on Sentiment Classification of Chinese Review using Machine Learning Techniques , 2007, 2007 International Conference on Natural Language Processing and Knowledge Engineering.

[9]  A. Atiya,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2005, IEEE Transactions on Neural Networks.

[10]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[11]  Ted Dunning,et al.  Accurate Methods for the Statistics of Surprise and Coincidence , 1993, CL.

[12]  Marc'Aurelio Ranzato,et al.  Ensemble of Generative and Discriminative Techniques for Sentiment Analysis of Movie Reviews , 2014, ICLR.

[13]  Josef Steinberger,et al.  UWB at SemEval-2016 Task 6: Stance Detection , 2016, *SEMEVAL.

[14]  Thorsten Joachims,et al.  Learning to classify text using support vector machines - methods, theory and algorithms , 2002, The Kluwer international series in engineering and computer science.

[15]  Saif Mohammad,et al.  SemEval-2016 Task 6: Detecting Stance in Tweets , 2016, *SEMEVAL.

[16]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[17]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[18]  Soo-Min Kim,et al.  Crystal: Analyzing Predictive Opinions on the Web , 2007, EMNLP.

[19]  Susumu Horiguchi,et al.  Learning to classify short and sparse text & web with hidden topics from large-scale data collections , 2008, WWW.

[20]  Wei-Ying Ma,et al.  Locality preserving indexing for document representation , 2004, SIGIR '04.

[21]  Kazi Saidul Predicting Stance in Ideological Debate with Rich Linguistic Knowledge , 2012 .

[22]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[23]  Kalina Bontcheva,et al.  Monolingual Social Media Datasets for Detecting Contradiction and Entailment , 2016, LREC.