论文信息 - Feature selection and data sampling methods for learning reputation dimensions: The University of Amsterdam at RepLab 2014

Feature selection and data sampling methods for learning reputation dimensions: The University of Amsterdam at RepLab 2014

We report on our participation in the reputation dimension task of the CLEF RepLab 2014 evaluation initiative, i.e., to classify social media updates into eight predefined categories. We address the task by using corpus-based meth- ods to extract textual features from the labeled training data to train two classifiers in a supervised way. We explore three sampling strategies for selecting training examples, and probe their effect on classification performance. We find that all our submitted runs outperform the baseline, and that elaborate feature selection methods coupled with balanced datasets help improve classification accuracy.

M. de Rijke | Manos Tsagkias | Cristina Gârbacea

[1] Ángel F. Zazo Rodríguez,et al. REINA at RepLab2013 Topic Detection Task: Community Detection , 2013, CLEF.

[2] Richárd Farkas,et al. Filtering and Polarity Detection for Reputation Management on Tweets , 2013, CLEF.

[3] Julio Gonzalo,et al. Towards an Active Learning System for Company Name Disambiguation in Microblog Streams , 2013, CLEF.

[4] Andrew McCallum,et al. A comparison of event models for naive bayes text classification , 1998, AAAI 1998.

[5] Ian H. Witten,et al. Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[6] Julio Gonzalo,et al. Overview of RepLab 2012: Evaluating Online Reputation Management Systems , 2012, CLEF.

[7] Julio Gonzalo,et al. UNED Online Reputation Monitoring Team at RepLab 2013 , 2013, CLEF.

[8] Owen Rambow,et al. Sentiment Analysis of Twitter Data , 2011 .

[9] Gaël Varoquaux,et al. Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[10] Ewan Klein,et al. Natural Language Processing with Python , 2009 .

[11] Julio Gonzalo,et al. Overview of RepLab 2014: Author Profiling and Reputation Dimensions for Online Reputation Management , 2014, CLEF.