A Sequence Labelling Approach to Quote Attribution

Quote extraction and attribution is the task of automatically extracting quotes from text and attributing each quote to its correct speaker. The present state-of-the-art system uses gold standard information from previous decisions in its features, which, when removed, results in a large drop in performance. We treat the problem as a sequence labelling task, which allows us to incorporate sequence features without using gold standard information. We present results on two new corpora and an augmented version of a third, achieving a new state-of-the-art for systems using only realistic features.

[1]  Joel Nothman,et al.  Evaluating Entity Linking with Wikipedia , 2013, Artif. Intell..

[2]  James R. Curran,et al.  Investigating GIS and Smoothing for Maximum Entropy Taggers , 2003, EACL.

[3]  Silvia Pareti,et al.  A Database of Attribution Relations , 2012, LREC.

[4]  Noah A. Smith,et al.  Visualizing Topical Quotations Over Time to Understand News Discourse , 2010 .

[5]  Kevin R. Glass,et al.  A naïve, salience-based method for speaker identification in fiction books , 2007 .

[6]  Luís Sarmento,et al.  Automatic extraction of quotes and topics from news feeds , 2009 .

[7]  Kathleen McKeown,et al.  Automatic Attribution of Quoted Speech in Literary Narrative , 2010, AAAI.

[8]  Benoît Sagot,et al.  A Lexicon of French Quotation Verbs for Automatic Quotation Extraction , 2010, LREC.

[9]  Zygmunt Vetulani,et al.  Human Language Technology. Challenges for Computer Science and Linguistics , 2009, Lecture Notes in Computer Science.

[10]  Pascal Denis,et al.  Extracting and Visualizing Quotations from News Wires , 2009, LTC.

[11]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[12]  Richard Sproat,et al.  Identifying speakers in children's stories for speech synthesis , 2003, INTERSPEECH.

[13]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[14]  Steinberger Ralf,et al.  Automatic Detection of Quotations in Multilingual News , 2007 .

[15]  Graeme Hirst Human Language Technology , 2006 .

[16]  Christopher D. Manning,et al.  Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling , 2005, ACL.

[17]  Judith L. Klavans,et al.  Methods for precise named entity matching in digital collections , 2003, 2003 Joint Conference on Digital Libraries, 2003. Proceedings..

[18]  Nuno J. Mamede,et al.  Character Identification in Children Stories , 2004, EsTAL.