Hierarchical rule generalisation for speaker identification in fiction books

This paper presents a hierarchical pattern matching and generalisation technique which is applied to the problem of locating the correct speaker of quoted speech found in fiction books. Patterns from a training set are generalised to create a small number of rules, which can be used to identify items of interest within the text. The pattern matching technique is applied to finding the Speech-Verb, Actor and Speaker of quotes found in fiction books. The technique performs well over the training data, resulting in rule-sets many times smaller than the training set, but providing very high accuracy. While the rule-set generalised from one book is less effective when applied to different books than an approach based on hand coded heuristics, performance is comparable when testing on data closely related to the training set.

[1]  Dayne Freitag,et al.  Machine Learning for Information Extraction in Informal Domains , 2000, Machine Learning.

[2]  Doug Downey,et al.  Learning text patterns for web information extraction and assessment , 2004, AAAI 2004.

[3]  Shalom Lappin,et al.  An Algorithm for Pronominal Anaphora Resolution , 1994, CL.

[4]  Nina Wacholder,et al.  Disambiguation of Proper Names in Text , 1997, ANLP.

[5]  Jerry R. Hobbs Resolving pronoun references , 1986 .

[6]  Pasi Tapanainen Parsing in two frameworks: finite-state and functional dependency grammar , 1999 .

[7]  Raymond J. Mooney,et al.  Bottom-Up Relational Learning of Pattern Matching Rules for Information Extraction , 2003, J. Mach. Learn. Res..

[8]  William W. Cohen,et al.  Exploiting dictionaries in named entity extraction: combining semi-Markov extraction processes and data integration methods , 2004, KDD.

[9]  Richard Sproat,et al.  Identifying speakers in children's stories for speech synthesis , 2003, INTERSPEECH.

[10]  Stephen Soderland,et al.  Learning Information Extraction Rules for Semi-Structured and Free Text , 1999, Machine Learning.

[11]  Fabio Ciravegna,et al.  Adaptive Information Extraction from Text by Rule Induction and Generalisation , 2001, IJCAI.

[12]  Ralph Grishman,et al.  Automatic Acquisition of Domain Knowledge for Information Extraction , 2000, COLING.

[13]  Shaun Bangay,et al.  Evaluating parts-of-speech taggers for use in a text-to-scene conversion system , 2005 .

[14]  Richard Evans,et al.  A New, Fully Automatic Version of Mitkov's Knowledge-Poor Pronoun Resolution Method , 2002, CICLing.

[15]  Branimir Boguraev,et al.  Anaphora for Everyone: Pronominal Anaphora Resolution without a Parser , 1996, COLING.

[16]  Hervé Déjean Learning Rules and Their Exceptions , 2002, J. Mach. Learn. Res..

[17]  Richard M. Schwartz,et al.  An Algorithm that Learns What's in a Name , 1999, Machine Learning.

[18]  Craig A. Knoblock,et al.  A hierarchical approach to wrapper induction , 1999, AGENTS '99.