Learning Graph Walk Based Similarity Measures for Parsed Text

We consider a parsed text corpus as an instance of a labelled directed graph, where nodes represent words and weighted directed edges represent the syntactic relations between them. We show that graph walks, combined with existing techniques of supervised learning, can be used to derive a task-specific word similarity measure in this graph. We also propose a new path-constrained graph walk method, in which the graph walk process is guided by high-level knowledge about meaningful edge sequences (paths). Empirical evaluation on the task of named entity coordinate term extraction shows that this framework is preferable to vector-based models for small-sized corpora. It is also shown that the path-constrained graph walk algorithm yields both performance and scalability gains.

[1]  Mirella Lapata,et al.  Dependency-Based Construction of Semantic Space Models , 2007, CL.

[2]  James P. Callan,et al.  Structured retrieval for question answering , 2007, SIGIR.

[3]  Kevyn Collins-Thompson,et al.  Query expansion using random walk models , 2005, CIKM '05.

[4]  Gregory Grefenstette,et al.  Explorations in automatic thesaurus discovery , 1994 .

[5]  CollinsMichael,et al.  Discriminative Reranking for Natural Language Parsing , 2005 .

[6]  Christopher D. Manning,et al.  Generating Typed Dependency Parses from Phrase Structure Parses , 2006, LREC.

[7]  Razvan C. Bunescu,et al.  A Shortest Path Dependency Kernel for Relation Extraction , 2005, HLT.

[8]  Dekang Lin,et al.  Automatic Retrieval and Clustering of Similar Words , 1998, ACL.

[9]  William W. Cohen,et al.  Contextual search and name disambiguation in email using graphs , 2006, SIGIR.

[10]  Daniel Jurafsky,et al.  Learning Syntactic Patterns for Automatic Hypernym Discovery , 2004, NIPS.

[11]  Thad Hughes,et al.  Lexical Semantic Relatedness with Random Graph Walks , 2007, EMNLP.

[12]  William W. Cohen,et al.  A graph-search framework for associating gene identifiers with documents , 2006, BMC Bioinformatics.

[13]  E. Keenan,et al.  Noun Phrase Accessibility and Universal Grammar , 2008 .

[14]  Marco Gori,et al.  Learning Web Page Scores by Error Back-Propagation , 2005, IJCAI.

[15]  Andrew Y. Ng,et al.  Learning random walk models for inducing word dependency distributions , 2004, ICML.

[16]  Michael Collins,et al.  Ranking Algorithms for Named Entity Extraction: Boosting and the VotedPerceptron , 2002, ACL.

[17]  Marti A. Hearst Automatic Acquisition of Hyponyms , 1992 .

[18]  Marti A. Hearst Automatic Acquisition of Hyponyms from Large Text Corpora , 1992, COLING.

[19]  Charles L. A. Clarke,et al.  Frequency Estimates for Statistical Word Similarity Measures , 2003, NAACL.

[20]  Philip Resnik,et al.  Measuring Verb Similarity , 2000 .

[21]  William W. Cohen,et al.  Learning to rank typed graph walks: local and global approaches , 2007, WebKDD/SNA-KDD '07.