Grammatical Feature Engineering for Fine-grained IR Tasks

Information Retrieval tasks include nowadays more and more complex information in order to face contemporary challenges such as Opinion Mining (OM) or Question Answering (QA). These are examples of tasks where complex linguistic information is required for reasonable performances on realistic data sets. As natural language learning is usually applied to these tasks, rich structures, such as parse trees, are critical as they require complex resources and accurate pre-processing. In this paper, we show how good quality language learning methods can be applied to the above tasks by using grammatical representations simpler than parse trees. These features are here shown to achieve the state-of-art accuracy in different IR tasks, such as OM and QA.

[1]  Junlan Feng,et al.  Robust Sentiment Detection on Twitter from Biased and Noisy Data , 2010, COLING.

[2]  Michael Collins,et al.  Three Generative, Lexicalised Models for Statistical Parsing , 1997, ACL.

[3]  Richard Johansson,et al.  The Effect of Syntactic Representation on Semantic Role Labeling , 2008, COLING.

[4]  Yoav Freund,et al.  Large Margin Classification Using the Perceptron Algorithm , 1998, COLT.

[5]  Roberto Basili,et al.  Structured Lexical Similarity via Convolution Kernels on Dependency Trees , 2011, EMNLP.

[6]  Oren Etzioni,et al.  Scaling question answering to the Web , 2001, WWW '01.

[7]  Claudio Giuliano,et al.  A semi-supervised approach to question classification , 2009, ESANN.

[8]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[9]  Alessandro Moschitti,et al.  Efficient Convolution Kernels for Dependency and Constituent Syntactic Trees , 2006, ECML.

[10]  Fabio Massimo Zanzotto,et al.  Linguistic Redundancy in Twitter , 2011, EMNLP.

[11]  Richard Johansson,et al.  Extracting Opinion Expressions and Their Polarities - Exploration of Pipelines and Joint Models , 2011, ACL.

[12]  Michael Collins,et al.  Convolution Kernels for Natural Language , 2001, NIPS.

[13]  Ellen M. Voorhees,et al.  Overview of the Seventh Text REtrieval Conference , 1998 .

[14]  Richard Johansson,et al.  Dependency-based Syntactic–Semantic Analysis with PropBank and NomBank , 2008, CoNLL.

[15]  Ellen M. Voorhees,et al.  Overview of the seventh text retrieval conference (trec-7) [on-line] , 1999 .

[16]  Roberto Basili,et al.  Exploiting Syntactic and Shallow Semantic Kernels for Question Answer Classification , 2007, ACL.

[17]  Max Kaufmann Syntactic Normalization of Twitter Messages , 2010 .

[18]  Daniel Jurafsky,et al.  Automatic Labeling of Semantic Roles , 2002, CL.

[19]  Jaime G. Carbonell,et al.  Rank learning for factoid question answering with linguistic and semantic constraints , 2010, CIKM.

[20]  Lillian Lee,et al.  Opinion Mining and Sentiment Analysis , 2008, Found. Trends Inf. Retr..

[21]  Dan Roth,et al.  Learning Question Classifiers , 2002, COLING.

[22]  Roberto Basili,et al.  Tree Kernels for Semantic Role Labeling , 2008, CL.

[23]  Josef van Genabith,et al.  #hardtoparse: POS Tagging and Parsing the Twitterverse , 2011, Analyzing Microtext.