Random Walks on Text Structures

Since the early ages of artificial intelligence, associative or semantic networks have been proposed as representations that enable the storage of language units and the relationships that interconnect them, allowing for a variety of inference and reasoning processes, and simulating some of the functionalities of the human mind. The symbolic structures that emerge from these representations correspond naturally to graphs – relational structures capable of encoding the meaning and structure of a cohesive text, following closely the associative or semantic memory representations. The activation or ranking of nodes in such graph structures mimics to some extent the functioning of human memory, and can be turned into a rich source of knowledge useful for several language processing applications. In this paper, we suggest a framework for the application of graph-based ranking algorithms to natural language processing, and illustrate the application of this framework to two traditionally difficult text processing tasks: word sense disambiguation and text summarization.

[1]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[2]  Yi Zhang,et al.  Graph-based ranking algorithms for e-mail expertise analysis , 2003, DMKD '03.

[3]  Christiane Fellbaum,et al.  English Tasks: All-Words and Verb Lexical Sample , 2001, *SEMEVAL.

[4]  Helmut Berger,et al.  An Adaptive Information Retrieval System Based on Associative Networks , 2004, APCCM.

[5]  Rada Mihalcea,et al.  A Language Independent Algorithm for Single and Multiple Document Summarization , 2005, IJCNLP.

[6]  Eduard H. Hovy,et al.  Automatic Evaluation of Summaries Using N-gram Co-occurrence Statistics , 2003, NAACL.

[7]  Rada Mihalcea,et al.  Unsupervised Large-Vocabulary Word Sense Disambiguation with Graph-based Algorithms for Sequence Data Labeling , 2005, HLT.

[8]  Matthew. W. Spitzer,et al.  The Mind within the Net: Models of Learning, Thinking, and Acting , 1999 .

[9]  Dan I. Moldovan,et al.  Parallel Knowledge Processing in SNAP , 1993, IEEE Trans. Knowl. Data Eng..

[10]  Rada Mihalcea,et al.  TextRank: Bringing Order into Text , 2004, EMNLP.

[11]  Graeme Hirst,et al.  Resolving Lexical Ambiguity Computationally with Spreading Activation and Polaroid Words , 1988 .

[12]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[13]  Roger W. Schvaneveldt,et al.  Pathfinder associative networks: studies in knowledge organization , 1990 .

[14]  Allan Collins,et al.  A spreading-activation theory of semantic processing , 1975 .

[15]  Rada Mihalcea,et al.  PageRank on Semantic Networks, with Application to Word Sense Disambiguation , 2004, COLING.

[16]  Peter W. Foltz,et al.  An introduction to latent semantic analysis , 1998 .

[17]  Martha Palmer,et al.  The English all-words task , 2004, SENSEVAL@ACL.

[18]  John R. Anderson A spreading activation theory of memory. , 1983 .

[19]  Rada Mihalcea,et al.  Graph-based Ranking Algorithms for Sentence Extraction, Applied to Text Summarization , 2004, ACL.

[20]  Michael E. Lesk,et al.  Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone , 1986, SIGDOC '86.

[21]  Michael Zock,et al.  Word Lookup on the Basis of Associations : from an Idea to a Roadmap , 2004 .

[22]  Nancy Ide,et al.  Word Sense Disambiguation with Very Large Neural Networks Extracted from Machine Readable Dictionaries , 1990, COLING.

[23]  Gio Wiederhold,et al.  A word nexus for systematic interoperation of semantically heterogeneous data sources , 2001 .

[24]  Jon Kleinberg,et al.  Authoritative sources in a hyperlinked environment , 1999, SODA '98.

[25]  Graeme Hirst,et al.  Semantic distance in WordNet: An experimental, application-oriented evaluation of five measures , 2004 .

[26]  Edward Gibson,et al.  Paragraph-, Word-, and Coherence-based Approaches to Sentence Ranking: A Comparison of Algorithm and Human Performance , 2004, ACL.

[27]  G. Grimmett,et al.  Probability and random processes , 2002 .

[28]  Massimo Marchiori,et al.  The Limits of Web Metadata, and Beyond , 1998, Comput. Networks.

[29]  Paul R. Cohen,et al.  Information retrieval by constrained spreading activation in semantic networks , 1987, Inf. Process. Manag..

[30]  Michele Banko,et al.  Event-Centric Summary Generation , 2004 .

[32]  George A. Miller,et al.  A Semantic Concordance , 1993, HLT.