A French Fairy Tale Corpus syntactically and semantically annotated

Fairy tales, folktales and more generally children stories have lately attracted the Natural Language Processing (NLP) community. As such, very few corpora exist and linguistic resources are lacking. The work presented in this paper aims at filling this gap by presenting a syntactically and semantically annotated corpus. It focuses on the linguistic analysis of a Fairy Tales Corpus, and provides the description of the syntactic and semantic resources developed for Information Extraction. Resources include syntactic dependency relation annotation for 120 verbs; referential annotation, which is concerned with annotating each anaphoric occurrence and Proper Name with the most specific noun in the text; ontology matching for a substantial part of the nouns in the corpus; semantic role labelling for 41 verbs using the FrameNet database. The article also sums up previous analyses of this corpus and indicates possible uses of this corpus for the NLP community.

[1]  James Pustejovsky,et al.  Semantic Coercion in Language: Beyond Distributional Analysis , 2012 .

[2]  James Pustejovsky,et al.  Semantic coercion in language , 2008 .

[3]  Christophe d'Alessandro,et al.  Towards a Storytelling Humanoid Robot , 2010, AAAI Fall Symposium: Dialog with Robots.

[4]  John Sinclair,et al.  Corpus, Concordance, Collocation , 1991 .

[5]  Jean-Yves Antoine,et al.  Emologus - A Compositional Model of Emotion Detection Based on the Propositional Content of Spoken Utterances , 2010, TSD.

[6]  Satoshi Sekine,et al.  A survey of named entity recognition and classification , 2007 .

[7]  Patrick. Hanks,et al.  Lexical Patterns: from Hornby to Hunston and beyond , 2008 .

[8]  John B. Lowe,et al.  The Berkeley FrameNet Project , 1998, ACL.

[9]  Thierry Declerck,et al.  APftML - Augmented Proppian fairy tale Markup Language , 2010 .

[10]  Helmut Schmidt,et al.  Probabilistic part-of-speech tagging using decision trees , 1994 .

[11]  Vladimir Propp,et al.  Morphology of the folktale , 1959 .

[12]  Walt Detmar Meurers,et al.  Emotional Perception of Fairy Tales: Achieving Agreement in Emotion Annotation of Text , 2010, HLT-NAACL 2010.

[13]  James Pustejovsky,et al.  The Generative Lexicon , 1995, CL.

[14]  Takanori Shibata,et al.  EmotiRob: Companion robot Project , 2007, RO-MAN 2007 - The 16th IEEE International Symposium on Robot and Human Interactive Communication.

[15]  Dominique Duhaut,et al.  Comparing child and adult language: exploring semantic constraints , 2009, WOCCI '09.

[16]  Marc Le Tallec,et al.  Ontologies Naturelles et Coercion : Formalisation de Connaissances À partir d'observations en Corpus , 2009 .

[17]  Christopher D. Manning,et al.  The Stanford Typed Dependencies Representation , 2008, CF+CDPE@COLING.

[18]  Dirk Heylen,et al.  The Virtual Storyteller: story creation by intelligent agents , 2003 .

[19]  Miriam R. L. Petruck FRAME SEMANTICS , 1996 .

[20]  Constantin Orasan,et al.  Anaphora Resolution Exercise: an Overview , 2008, LREC.

[21]  C. Blanche-Benveniste,et al.  Le français parlé : études grammaticales , 1990 .

[22]  Antti Aarne,et al.  The types of the folktale : a classification and bibliography : Antti Aarne's Verzeichnis der Märchentypen (FF communications no. 3) , 1963 .

[23]  Daniel Gildea,et al.  The Proposition Bank: An Annotated Corpus of Semantic Roles , 2005, CL.