Robust Language Learning via Efficient Budgeted Online Algorithms

In many Natural Language Processing tasks, kernel learning allows to define robust and effective systems. At the same time, Online Learning Algorithms are appealing for their incremental and continuous learning capability. They allow to follow a target problem, with a constant adaptation to a dynamic environment. The drawback of using kernels in online settings is the continuous complexity growth, in terms of time and memory usage, experienced both in the learning and classification phases. In this paper, we extend a state-of-the-art Budgeted Online Learning Algorithm that efficiently constraints the overall complexity. We introduce the principles of Fairness and Weight Adjustment: the former mitigates the effect of unbalanced datasets, while the latter improves the stability of the resulting models. The usage of robust semantic kernel functions in Sentiment Analysis in Twitter improves the results with respect to the standard budgeted formulation. Performances are comparable with one of the most efficient Support Vector Machine implementations, still preserving all the advantages of online methods. Results are straightforward considering that the task has been tackled without manually coded resources (e.g. WordNet or a Polarity Lexicon) but mainly exploiting distributional analysis of unlabeled corpora.

[1]  Preslav Nakov,et al.  SemEval-2013 Task 2: Sentiment Analysis in Twitter , 2013, *SEMEVAL.

[2]  Patrick Pantel,et al.  From Frequency to Meaning: Vector Space Models of Semantics , 2010, J. Artif. Intell. Res..

[3]  Alessandro Moschitti,et al.  State-of-the-Art Kernels for Natural Language Processing , 2012, ACL.

[4]  Roberto Basili,et al.  Parsing engineering and empirical robustness , 2002, Natural Language Engineering.

[5]  Bo Pang,et al.  Thumbs up? Sentiment Classification using Machine Learning Techniques , 2002, EMNLP.

[6]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[7]  Johanna D. Moore,et al.  Twitter Sentiment Analysis: The Good the Bad and the OMG! , 2011, ICWSM.

[8]  Ethem Alpaydin,et al.  Multiple Kernel Learning Algorithms , 2011, J. Mach. Learn. Res..

[9]  Ari Rappoport,et al.  Enhanced Sentiment Learning Using Twitter Hashtags and Smileys , 2010, COLING.

[10]  Michael Collins,et al.  Convolution Kernels for Natural Language , 2001, NIPS.

[11]  Owen Rambow,et al.  Sentiment Analysis of Twitter Data , 2011 .

[12]  Gene H. Golub,et al.  Calculating the singular values and pseudo-inverse of a matrix , 2007, Milestones in Matrix Computation.

[13]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2003, ICTAI.

[14]  Yoram Singer,et al.  The Forgetron: A Kernel-Based Perceptron on a Budget , 2008, SIAM J. Comput..

[15]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[16]  Roberto Basili,et al.  Grammatical Feature Engineering for Fine-grained IR Tasks , 2012, IIR.

[17]  Tommi S. Jaakkola,et al.  Maximum Entropy Discrimination , 1999, NIPS.

[18]  Katharina Morik,et al.  Combining Statistical Learning with a Knowledge-Based Approach - A Case Study in Intensive Care Monitoring , 1999, ICML.

[19]  Slobodan Vucetic,et al.  Online Passive-Aggressive Algorithms on a Budget , 2010, AISTATS.

[20]  Taghi M. Khoshgoftaar,et al.  Experimental perspectives on learning from imbalanced data , 2007, ICML '07.

[21]  Kilho Shin Partitionable Kernels for Mapping Kernels , 2011, 2011 IEEE 11th International Conference on Data Mining.

[22]  Gerard Salton,et al.  A vector space model for automatic indexing , 1975, CACM.

[23]  Junlan Feng,et al.  Robust Sentiment Detection on Twitter from Biased and Noisy Data , 2010, COLING.

[24]  T. Landauer,et al.  A Solution to Plato's Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge. , 1997 .

[25]  Claudio Gentile,et al.  Tracking the best hyperplane with a simple budget Perceptron , 2006, Machine Learning.

[26]  N. Littlestone Learning Quickly When Irrelevant Attributes Abound: A New Linear-Threshold Algorithm , 1987, 28th Annual Symposium on Foundations of Computer Science (sfcs 1987).

[27]  Danilo Croce,et al.  Manifold Learning for the Semi-Supervised Induction of FrameNet Predicates: An Empirical Investigation , 2010 .

[28]  Koby Crammer,et al.  Online Passive-Aggressive Algorithms , 2003, J. Mach. Learn. Res..

[29]  Oren Etzioni,et al.  Scaling question answering to the Web , 2001, WWW '01.

[30]  Bernard J. Jansen,et al.  Twitter power: Tweets as electronic word of mouth , 2009, J. Assoc. Inf. Sci. Technol..

[31]  Patrick Paroubek,et al.  Twitter as a Corpus for Sentiment Analysis and Opinion Mining , 2010, LREC.

[32]  F ROSENBLATT,et al.  The perceptron: a probabilistic model for information storage and organization in the brain. , 1958, Psychological review.

[33]  David Haussler,et al.  Convolution kernels on discrete structures , 1999 .

[34]  Albert Bifet,et al.  Sentiment Knowledge Discovery in Twitter Streaming Data , 2010, Discovery Science.

[35]  Yoshua Bengio,et al.  Word Representations: A Simple and General Method for Semi-Supervised Learning , 2010, ACL.

[36]  Björn W. Schuller,et al.  New Avenues in Opinion Mining and Sentiment Analysis , 2013, IEEE Intelligent Systems.

[37]  Mirella Lapata,et al.  Composition in Distributional Models of Semantics , 2010, Cogn. Sci..

[38]  Evgeniy Gabrilovich,et al.  Computing Semantic Relatedness Using Wikipedia-based Explicit Semantic Analysis , 2007, IJCAI.

[39]  B. AfeArd CALCULATING THE SINGULAR VALUES AND PSEUDOINVERSE OF A MATRIX , 2022 .

[40]  Barbara Caputo,et al.  The projectron: a bounded kernel-based Perceptron , 2008, ICML '08.

[41]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[42]  Yoram Singer,et al.  Support Vector Machines on a Budget , 2006, NIPS.