Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics

Abstract Relation extraction is a challenging task in natural language processing. Syntactic features are recently shown to be quite effective for relation extraction. In this paper, we generalize the state of the art syntactic convolution tree kernel introduced by Collins and Duffy. The proposed generalized kernel is more flexible and customizable, and can be conveniently utilized for systematic generation of more effective application specific syntactic sub-kernels. Using the generalized kernel, we will also propose a number of novel syntactic sub-kernels for relation extraction. These kernels show a remarkable performance improvement over the original Collins and Duffy kernel in the extraction of ACE-2005 relation types. 1 Introduction One of the contemporary demanding NLP tasks is information extraction, which is the procedure of extracting structured information such as entities, relations, and events from free text documents. As an information extraction sub-task, semantic relation extraction is the procedure of finding predefined semantic relations between textual entity mentions. For instance, assuming a semantic relation with type Physical and subtype Located between an entity of type Person and another entity of type Location , the sentence " Police arrested Mark at the airport last week. " conveys two mentions of this relation between "Mark" and "airport" and also between "police" and "airport" that can be shown in the following format. Phys.Located(Mark, airport) Phys.Located(police, airport) Relation extraction is a key step towards question answering systems by which vital structured data is acquired from underlying free text resources. Detection of protein interactions in biomedical corpora (Li et al., 2008) is another valuable application of relation extraction. Relation extraction can be approached by a standard classification learning method. We particularly use SVM  Cortes and Vapnik, 1995) and kernel functions as our classification method. A kernel is a function that calculates the inner product of two transformed vectors of a high dimensional feature space using the original feature vectors as shown in eq. 1. K(Xi,Xj ) =f(Xi ).f(X

[1]  Ian H. Witten,et al.  Weka: Practical machine learning tools and techniques with Java implementations , 1999 .

[2]  Paul Taylor,et al.  Hidden Markov models for grapheme to phoneme conversion , 2005, INTERSPEECH.

[3]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[4]  Robert I. Damper,et al.  A multistrategy approach to improving pronunciation by analogy , 2000, CL.

[5]  Patrick Pantel,et al.  VerbOcean: Mining the Web for Fine-Grained Semantic Verb Relations , 2004, EMNLP.

[6]  Carlo Strapparava,et al.  WordNet Affect: an Affective Extension of WordNet , 2004, LREC.

[7]  Daniel Jurafsky,et al.  Shallow Semantic Parsing using Support Vector Machines , 2004, NAACL.

[8]  Marc Schröder,et al.  Dimensional Emotion Representation as a Basis for Speech Synthesis with Non-extreme Emotions , 2004, ADS.

[9]  Alessandro Moschitti Syntactic Kernels for Natural Language Learning: the Semantic Role Labeling Case , 2006, HLT-NAACL.

[10]  Antal van den Bosch,et al.  Improved morpho-phonological sequence processing with constraint satisfaction inference , 2006, SIGMORPHON.

[11]  Michael Collins,et al.  Convolution Kernels for Natural Language , 2001, NIPS.

[12]  Eugene Charniak,et al.  A Maximum-Entropy-Inspired Parser , 2000, ANLP.

[13]  Andreas Stolcke,et al.  Does active learning help automatic dialog act tagging in meeting data? , 2005, INTERSPEECH.

[14]  Razvan C. Bunescu,et al.  Subsequence Kernels for Relation Extraction , 2005, NIPS.

[15]  Anton Nijholt,et al.  Dialogue Act Recognition with Bayesian Networks for Dutch Dialogues , 2002, SIGDIAL Workshop.

[16]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[17]  Bing Liu,et al.  Mining Opinion Features in Customer Reviews , 2004, AAAI.

[18]  A. Koller,et al.  Speech Acts: An Essay in the Philosophy of Language , 1969 .

[19]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[20]  Jian Su,et al.  Exploring Various Knowledge in Relation Extraction , 2005, ACL.

[21]  T. Chartrand,et al.  The chameleon effect: the perception-behavior link and social interaction. , 1999, Journal of personality and social psychology.

[22]  Bo Pang,et al.  Thumbs up? Sentiment Classification using Machine Learning Techniques , 2002, EMNLP.

[23]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[24]  Christopher D. Manning,et al.  Enriching the Knowledge Sources Used in a Maximum Entropy Part-of-Speech Tagger , 2000, EMNLP.

[25]  William J. Cook,et al.  Solution of a Large-Scale Traveling-Salesman Problem , 1954, 50 Years of Integer Programming.

[26]  E. Schegloff,et al.  A simplest systematics for the organization of turn-taking for conversation , 1974 .

[27]  Janyce Wiebe,et al.  Preposition semantic classification via Penn Treebank and FrameNet , 2003, HLT-NAACL 2003.

[28]  Razvan C. Bunescu,et al.  Learning for information extraction: from named entity recognition and disambiguation to relation extraction , 2007 .

[29]  Bo Pang,et al.  A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts , 2004, ACL.

[30]  Nigel G. Ward,et al.  A combined method for discovering short-term affect-based response rules for spoken tutorial dialog , 2007, SLaTE.

[31]  Mark G. Core,et al.  Coding Dialogs with the DAMSL Annotation Scheme , 1997 .

[32]  Diane J. Litman,et al.  Predicting Emotion in Spoken Dialogue from Multiple Knowledge Sources , 2004, NAACL.

[33]  Grzegorz Kondrak,et al.  Alignment-Based Discriminative String Similarity , 2007, ACL.

[34]  Razvan C. Bunescu,et al.  A Shortest Path Dependency Kernel for Relation Extraction , 2005, HLT.

[35]  Peter W. Foltz,et al.  Textual coherence using latent semantic analysis , 1998 .

[36]  Hsinchun Chen,et al.  Kernel-based learning for biomedical relation extraction , 2008, J. Assoc. Inf. Sci. Technol..

[37]  Mirella Lapata,et al.  Probabilistic Text Structuring: Experiments with Sentence Ordering , 2003, ACL.

[38]  Koby Crammer,et al.  Online Large-Margin Training of Dependency Parsers , 2005, ACL.

[39]  Timothy Baldwin,et al.  MELB-YB: Preposition Sense Disambiguation Using Rich Semantic Features , 2007, Fourth International Workshop on Semantic Evaluations (SemEval-2007).

[40]  Jian Su,et al.  Exploring Syntactic Features for Relation Extraction using a Convolution Tree Kernel , 2006, NAACL.

[41]  Sanjeev Khudanpur,et al.  Transliteration of Proper Names in Cross-Lingual Information Retrieval , 2003, NER@ACL.

[42]  Ani Nenkova,et al.  Evaluating Content Selection in Summarization: The Pyramid Method , 2004, NAACL.

[43]  John J. Godfrey,et al.  SWITCHBOARD: telephone speech corpus for research and development , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[44]  Tingting He,et al.  CCNU at TAC 2008: Proceeding on Using Semantic Method for Automated Summarization Yield , 2008, TAC.

[45]  Peter D. Turney Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews , 2002, ACL.

[46]  Koby Crammer,et al.  Ultraconservative Online Algorithms for Multiclass Problems , 2001, J. Mach. Learn. Res..

[47]  Alessandro Moschitti,et al.  Making Tree Kernels Practical for Natural Language Learning , 2006, EACL.

[48]  Patrick F. Reidy An Introduction to Latent Semantic Analysis , 2009 .

[49]  Michael W. Berry,et al.  Large-Scale Sparse Singular Value Computations , 1992 .

[50]  Sebastian Riedel,et al.  Incremental Integer Linear Programming for Non-projective Dependency Parsing , 2006, EMNLP.

[51]  Carlo Strapparava,et al.  Domain Kernels for Text Categorization , 2005, CoNLL.

[52]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[53]  Soo-Min Kim,et al.  Determining the Sentiment of Opinions , 2004, COLING.

[54]  Janyce Wiebe,et al.  Just How Mad Are You? Finding Strong and Weak Opinion Clauses , 2004, AAAI.

[55]  Daniel Marcu,et al.  Fast and optimal decoding for machine translation , 2004, Artif. Intell..

[56]  Markus Dreyer,et al.  Latent-Variable Modeling of String Transductions with Finite-State Methods , 2008, EMNLP.

[57]  Alan W. Black,et al.  Issues in building general letter to sound rules , 1998, SSW.

[58]  John A. Carroll,et al.  Robust, applied morphological generation , 2000, INLG.

[59]  Michael Gamon,et al.  Sentiment classification on customer feedback data: noisy data, large feature vectors, and the role of linguistic analysis , 2004, COLING.

[60]  David R. Traum,et al.  20 Questions on Dialogue Act Taxonomies , 2000, J. Semant..

[61]  Alan W. Black,et al.  Learning Pronunciation Dictionaries: Language Complexity and Word Selection Strategies , 2006, NAACL.

[62]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[63]  Sung-Hyon Myaeng,et al.  Automatic identification and back-transliteration of foreign words for information retrieval , 1999, Inf. Process. Manag..

[64]  Kevin Knight,et al.  Machine Transliteration , 1997, CL.

[65]  Yaser Al-Onaizan,et al.  Machine Transliteration of Names in Arabic Texts , 2002, SEMITIC@ACL.

[66]  T. Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1999, ECML.

[67]  Stephen G. Pulman,et al.  Sentence ordering with manifold-based classification in multi-document summarization , 2006, EMNLP.

[68]  Regina Barzilay,et al.  Catching the Drift: Probabilistic Content Models, with Applications to Generation and Summarization , 2004, NAACL.

[69]  Vibhu O. Mittal,et al.  Comparative Experiments on Sentiment Classification for Online Product Reviews , 2006, AAAI.

[70]  Ning Wang,et al.  Can Virtual Humans Be More Engaging Than Real Ones? , 2007, HCI.

[71]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[72]  Claire Grover,et al.  Rule-Based Chunking and Reusability , 2006, LREC.

[73]  Arthur C. Graesser,et al.  Automatic detection of learner’s affect from conversational cues , 2008, User Modeling and User-Adapted Interaction.

[74]  Dan Klein,et al.  Accurate Unlexicalized Parsing , 2003, ACL.

[75]  David M. Pennock,et al.  Mining the peanut gallery: opinion extraction and semantic classification of product reviews , 2003, WWW '03.

[76]  George Forman,et al.  An Extensive Empirical Study of Feature Selection Metrics for Text Classification , 2003, J. Mach. Learn. Res..

[77]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[78]  Justine Cassell,et al.  Negotiated Collusion: Modeling Social Language and its Relationship Effects in Intelligent Agents , 2003, User Modeling and User-Adapted Interaction.

[79]  Grzegorz Kondrak,et al.  Joint Processing and Discriminative Training for Letter-to-Phoneme Conversion , 2008, ACL.

[80]  Hermann Ney,et al.  Investigations on joint-multigram models for grapheme-to-phoneme conversion , 2002, INTERSPEECH.

[81]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[82]  Daniel Marcu,et al.  Statistical Phrase-Based Translation , 2003, NAACL.

[83]  Adrian Iftene,et al.  Hypothesis Transformation and Semantic Variability Rules Used in Recognizing Textual Entailment , 2007, ACL-PASCAL@ACL.

[84]  Adam L. Berger,et al.  A Maximum Entropy Approach to Natural Language Processing , 1996, CL.

[85]  Guodong Zhou,et al.  Tree Kernel-Based Relation Extraction with Context-Sensitive Structured Parse Tree Information , 2007, EMNLP.

[86]  Roger Levy,et al.  Tregex and Tsurgeon: tools for querying and manipulating tree data structures , 2006, LREC.

[87]  Ellen Riloff,et al.  Learning Extraction Patterns for Subjective Expressions , 2003, EMNLP.

[88]  Michel Généreux,et al.  Description of the LIPN Systems at TAC 2008: Summarizing Information and Opinions , 2008, TAC.

[89]  Danushka Bollegala,et al.  A Bottom-Up Approach to Sentence Ordering for Multi-Document Summarization , 2006, ACL.

[90]  Hermann Ney,et al.  The RWTH Phrase-based Statistical Machine Translation System , 2005, IWSLT.

[91]  R. Power The organisation of purposeful dialogues , 1979 .

[92]  Elena Lloret,et al.  The DLSIUAES Team's Participation in the TAC 2008 Tracks , 2008, TAC.

[93]  Fermín L. Cruz,et al.  The Italica System at TAC 2008 Opinion Summarization Task , 2008, TAC.

[94]  Claire Cardie,et al.  Toward Opinion Summarization: Linking the Sources , 2006 .

[95]  Andreas Stolcke,et al.  Dialogue act modeling for automatic tagging and recognition of conversational speech , 2000, CL.

[96]  K. Fischer,et al.  DESPERATELY SEEKING EMOTIONS OR: ACTORS, WIZARDS, AND HUMAN BEINGS , 2000 .

[97]  Koby Crammer,et al.  Flexible Text Segmentation with Structured Multilabel Classification , 2005, HLT.

[98]  Christian Posse,et al.  PNNL: A Supervised Maximum Entropy Approach to Word Sense Disambiguation , 2007, SemEval@ACL.

[99]  Robert Krovetz Viewing morphology as an inference process , 2000, Artif. Intell..

[100]  Iris Hendrickx,et al.  Using Coreference Links and Sentence Compression in Graph-based Summarization , 2008, TAC.

[101]  Dmitry Zelenko,et al.  Discriminative Methods for Transliteration , 2006, EMNLP.

[102]  Massimo Poesio,et al.  The predictive power of game structure in dialogue act recognition: experimental results using maximum entropy estimation , 1998, ICSLP.

[103]  R. Jones,et al.  Active Learning with Feedback on Both Features and Instances , 2006 .

[104]  David Vilar,et al.  Dialogue act classification using a Bayesian approach ∗ , 2004 .

[105]  Bing Liu,et al.  Mining and summarizing customer reviews , 2004, KDD.

[106]  Andrew McCallum,et al.  A Conditional Random Field for Discriminatively-trained Finite-state String Edit Distance , 2005, UAI.

[107]  Robert I. Damper,et al.  Aligning letters and phonemes for speech synthesis , 2004, SSW.

[108]  Mirella Lapata,et al.  Modeling Local Coherence: An Entity-Based Approach , 2005, ACL.

[109]  Kristina Toutanova,et al.  Pronunciation Modeling for Improved Spelling Correction , 2002, ACL.

[110]  Emanuele Pianta,et al.  The TextPro Tool Suite , 2008, LREC.

[111]  Grzegorz Kondrak,et al.  Applying Many-to-Many Alignments and Hidden Markov Models to Letter-to-Phoneme Conversion , 2007, NAACL.

[112]  Yukiko Sasaki Alam,et al.  Decision Trees for Sense Disambiguation of Prepositions: Case of Over , 2004, HLT-NAACL 2004.

[113]  Franz Josef Och,et al.  Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.

[114]  Nikiforos Karamanis,et al.  Evaluating Centering for Sentence Ordering in Two New Domains , 2006, NAACL.

[115]  Jason Eisner,et al.  Modeling Annotators: A Generative Approach to Learning from Annotator Rationales , 2008, EMNLP.

[116]  Eric K. Ringger,et al.  Pulse: Mining Customer Opinions from Free Text , 2005, IDA.

[117]  Kenneth C. Litkowski,et al.  SemEval-2007 Task 06: Word-Sense Disambiguation of Prepositions , 2007, Fourth International Workshop on Semantic Evaluations (SemEval-2007).

[118]  Csr Young,et al.  How to Do Things With Words , 2009 .

[119]  Elmar Nöth,et al.  Integrated dialog act segmentation and classification using prosodic features and language models , 1997, EUROSPEECH.

[120]  Hong-Goo Kang,et al.  A perspective on the next challenges for TTS research , 2002, Proceedings of 2002 IEEE Workshop on Speech Synthesis, 2002..

[121]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[122]  Kathleen R. McKeown,et al.  Information fusion for multidocument summarization: paraphrasing and generation , 2003 .

[123]  Nicole Novielli,et al.  Attitude Display in Dialogue Patterns , 2008 .

[124]  Christine D. Piatko,et al.  Using “Annotator Rationales” to Improve Machine Learning for Text Categorization , 2007, NAACL.

[125]  Elizabeth Shriberg,et al.  Switchboard SWBD-DAMSL shallow-discourse-function annotation coders manual , 1997 .

[126]  Elena Lloret,et al.  A Text Summarization Approach under the Influence of Textual Entailment , 2016, NLPCS.

[127]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[128]  Lina Zhou,et al.  Movie Review Mining: a Comparison between Supervised and Unsupervised Classification Approaches , 2005, Proceedings of the 38th Annual Hawaii International Conference on System Sciences.

[129]  Ken Litkowski,et al.  The Preposition Project , 2021, ArXiv.

[130]  Shahram Khadivi,et al.  A Sequence Alignment Model Based on the Averaged Perceptron , 2007, EMNLP.

[131]  John B. Lowe,et al.  The Berkeley FrameNet Project , 1998, ACL.

[132]  John R. Searle,et al.  Speech Acts: An Essay in the Philosophy of Language , 1970 .

[133]  Deniz Yuret,et al.  KU: Word Sense Disambiguation by Substitution , 2007, Fourth International Workshop on Semantic Evaluations (SemEval-2007).

[134]  Thorsten Joachims,et al.  Evaluating Retrieval Performance Using Clickthrough Data , 2003, Text Mining.

[135]  Dan Klein,et al.  Prototype-Driven Learning for Sequence Models , 2006, NAACL.

[136]  Walter Daelemans,et al.  Language-Independent Data-Oriented Grapheme-to-Phoneme Conversion , 1996 .

[137]  Emanuele Pianta,et al.  IRST-BP: Preposition Disambiguation based on Chain Clarifying Relationships Contexts , 2007, SemEval@ACL.

[138]  J. M. Kittross The measurement of meaning , 1959 .

[139]  Grzegorz Kondrak,et al.  Automatic Syllabification with Structured SVMs for Letter-to-Phoneme Conversion , 2008, ACL.

[140]  Michael Collins,et al.  Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms , 2002, EMNLP.

[141]  Walter Daelemans,et al.  Do Not Forget: Full Memory in Memory-Based Learning of Word Pronunciation , 1998, CoNLL.

[142]  E. Schegloff Sequencing in Conversational Openings , 1968 .

[143]  Nicole Novielli,et al.  Social Attitude Towards A Conversational Character , 2006, ROMAN 2006 - The 15th IEEE International Symposium on Robot and Human Interactive Communication.

[144]  Gideon S. Mann,et al.  Learning from labeled features using generalized expectation criteria , 2008, SIGIR '08.

[145]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.