Experiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling Notebook for PAN at CLEF 2013
暂无分享,去创建一个
Inspired by our ongoing work in the project WENDY, which addresses age detection in social networks by linguistic processing (among other methods), we have built a system that makes use of a number of linguistic resources (a Spanish dictionary, and a SMS-language dictionary) and algorithms (custom text utterances tokenization, SMS to standard Spanish translation, and a number of normalization rules) in order to apply a learning-based approach using a custom Stochastic Gradient Descent algorithm adapted to text, to the Spanish Author Profiling task at PAN’2013. We believe the results obtained in internal testing on a validation set extracted from training dataset do validate our approach in WENDY, while the results obtained in the PAN task are not as good as expected.
[1] Tong Zhang,et al. Solving large scale linear prediction problems using stochastic gradient descent algorithms , 2004, ICML.
[2] Craig H. Martell,et al. Age Detection in Chat , 2009, 2009 IEEE International Conference on Semantic Computing.
[3] Fabrizio Sebastiani,et al. Machine learning in automated text categorization , 2001, CSUR.