Semantic Tree Kernels for Statistical Natural Language Learning

A central topic in Natural Language Processing (NLP) is the design of effective linguistic processors suitable for the target applications. Within this scenario, Convolution Kernels provide a powerful method to directly apply Machine Learning algorithms to complex structures representing linguistic information. The main topic of this work is the definition of the semantically Smoothed Partial Tree Kernel (SPTK), a generalized formulation of one of the most performant Convolution Kernels, i.e. the Tree Kernel (TK), by extending the similarity between tree structures with node similarities. The main characteristic of SPTK is its ability to measure the similarity between syntactic tree structures, which are partially similar and whose nodes can differ but are nevertheless semantically related. One of the most important outcomes is that SPTK allows for embedding external lexical information in the kernel function only through a similarity function among lexical nodes. The SPTK has been evaluated in three complex automatic Semantic Processing tasks: Question Classification in Question Answering, Verb Classification and Semantic Role Labeling. Although these tasks address different problems, state-of-the-art results have been achieved in every evaluation.

[1]  Roberto Basili,et al.  Tree Kernels for Semantic Role Labeling , 2008, CL.

[2]  Mehrnoosh Sadrzadeh,et al.  Experimental Support for a Categorical Compositional Distributional Model of Meaning , 2011, EMNLP.

[3]  Stephan Bloehdorn,et al.  Structure and semantics for expressive text kernels , 2007, CIKM '07.

[4]  Alessandro Moschitti,et al.  Shallow Semantic Parsing for Spoken Language Understanding , 2009, NAACL.

[5]  Mirella Lapata,et al.  Composition in Distributional Models of Semantics , 2010, Cogn. Sci..

[6]  Thorsten Joachims,et al.  Estimating the Generalization Performance of an SVM Efficiently , 2000, ICML.

[7]  Alexander S. Yeh,et al.  More accurate tests for the statistical significance of result differences , 2000, COLING.

[8]  Eugene Charniak,et al.  A Maximum-Entropy-Inspired Parser , 2000, ANLP.

[9]  Nello Cristianini,et al.  Latent Semantic Kernels , 2001, Journal of Intelligent Information Systems.

[10]  Mirella Lapata,et al.  Dependency-Based Construction of Semantic Space Models , 2007, CL.

[11]  Daniel Gildea,et al.  The Necessity of Parsing for Predicate Argument Recognition , 2002, ACL.

[12]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[13]  Lillian Lee,et al.  Opinion Mining and Sentiment Analysis , 2008, Found. Trends Inf. Retr..

[14]  Dan Roth,et al.  Learning Question Classifiers , 2002, COLING.

[15]  T. Landauer,et al.  A Solution to Plato's Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge. , 1997 .

[16]  Alessandro Moschitti,et al.  Syntactic/Semantic Structures for Textual Entailment Recognition , 2010, HLT-NAACL.

[17]  Silvia Bernardini,et al.  The WaCky wide web: a collection of very large linguistically processed web-crawled corpora , 2009, Lang. Resour. Evaluation.

[18]  John B. Lowe,et al.  The Berkeley FrameNet Project , 1998, ACL.

[19]  Charles J. Fillmore,et al.  Frames and the semantics of understanding , 1985 .

[20]  Michael Collins,et al.  Convolution Kernels for Natural Language , 2001, NIPS.

[21]  Richard Johansson,et al.  Dependency-based Syntactic–Semantic Analysis with PropBank and NomBank , 2008, CoNLL.

[22]  Daniel Gildea,et al.  The Proposition Bank: An Annotated Corpus of Semantic Roles , 2005, CL.

[23]  Frederick Jelinek,et al.  Statistical methods for speech recognition , 1997 .

[24]  Martha Palmer,et al.  Verbnet: a broad-coverage, comprehensive verb lexicon , 2005 .

[25]  Ioannis Korkontzelos,et al.  Estimating Linear Models for Compositional Distributional Semantics , 2010, COLING.

[26]  Gene H. Golub,et al.  Calculating the singular values and pseudo-inverse of a matrix , 2007, Milestones in Matrix Computation.

[27]  Ryan M. Rifkin,et al.  In Defense of One-Vs-All Classification , 2004, J. Mach. Learn. Res..

[28]  Daniel Jurafsky,et al.  Automatic Labeling of Semantic Roles , 2002, CL.

[29]  Oren Etzioni,et al.  Scaling question answering to the Web , 2001, WWW '01.

[30]  Alessandro Lenci,et al.  One Distributional Memory, Many Semantic Spaces , 2009, Proceedings of the Workshop on Geometrical Models of Natural Language Semantics - GEMS '09.

[31]  Stephen Clark,et al.  Combining Symbolic and Distributional Models of Meaning , 2007, AAAI Spring Symposium: Quantum Interaction.

[32]  J. R. Firth,et al.  A Synopsis of Linguistic Theory, 1930-1955 , 1957 .

[33]  Alessandro Moschitti,et al.  Efficient Convolution Kernels for Dependency and Constituent Syntactic Trees , 2006, ECML.

[34]  J. R. Firth,et al.  Studies in Linguistic Analysis. , 1974 .

[35]  Daniel Jurafsky,et al.  Support Vector Learning for Semantic Argument Classification , 2005, Machine Learning.

[36]  Richard Johansson,et al.  The Effect of Syntactic Representation on Semantic Role Labeling , 2008, COLING.

[37]  Jean-Michel Renders,et al.  Word-Sequence Kernels , 2003, J. Mach. Learn. Res..

[38]  Martha Palmer,et al.  VerbNet Class Assignment as a WSD Task , 2011, IWCS.

[39]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2003, ICTAI.

[40]  Patrick Pantel,et al.  ISP: Learning Inferential Selectional Preferences , 2007, NAACL.

[41]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[42]  Hinrich Schütze,et al.  Word Space , 1992, NIPS.

[43]  Patrick Pantel,et al.  From Frequency to Meaning: Vector Space Models of Semantics , 2010, J. Artif. Intell. Res..

[44]  Dell Zhang,et al.  Question classification using support vector machines , 2003, SIGIR.

[45]  Xavier Carreras,et al.  Introduction to the CoNLL-2005 Shared Task: Semantic Role Labeling , 2005, CoNLL.

[46]  Roberto Basili,et al.  Space Projections as Distributional Models for Semantic Composition , 2012, CICLing.