Using Syntactic and Semantic Features for Classifying Modal Values in the Portuguese Language

This paper presents a study made in a field poorly explored in the Portuguese language – modality and its automatic tagging. Our main goal was to find a set of attributes for the creation of automatic taggers with improved performance over the bag-of-words (bow) approach. The performance was measured using precision, recall and \(F_1\). Because it is a relatively unexplored field, the study covers the creation of the corpus (composed by eleven verbs), the use of a parser to extract syntactic and semantic information from the sentences and a machine learning approach to identify modality values. Based on three different sets of attributes – from trigger itself and the trigger’s path (from the parse tree) and context – the system creates a tagger for each verb achieving (in almost every verb) an improvement in \(F_1\) when compared to the traditional bow approach.

[1]  Weiwei Guo,et al.  Committed Belief Annotation and Tagging , 2009, Linguistic Annotation Workshop.

[2]  Sophia Ananiadou,et al.  Extracting semantically enriched events from biomedical literature , 2012, BMC Bioinformatics.

[3]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[4]  Josef Ruppenhofer,et al.  Yes we can!? Annotating English modal verbs , 2012, LREC.

[5]  Michel Généreux,et al.  Introducing the Reference Corpus of Contemporary Portuguese , 2012 .

[6]  Malvina Nissim,et al.  Cross-linguistic annotation of modality: a data-driven hierarchical model , 2013, ACL 2013.

[7]  Yuji Matsumoto,et al.  Annotating Event Mentions in Text with Modality, Focus, and Source Information , 2010, LREC.

[8]  Christine D. Piatko,et al.  A Modality Lexicon and its use in Automatic Tagging , 2010, LREC.

[9]  James Pustejovsky,et al.  Annotating and Recognizing Event Modality in Text , 2006, FLAIRS.

[10]  F. Palmer,et al.  Mood and modality , 1986 .

[11]  Heliana Mello,et al.  Challenges in modality annotation in a Brazilian Portuguese Spontaneous Speech Corpus , 2013 .

[12]  János Csirik,et al.  The CoNLL-2010 Shared Task: Learning to Detect Hedges and their Scope in Natural Language Text , 2010, CoNLL Shared Task.

[13]  J. van der Auwera,et al.  Modality’s semantic map , 1998 .

[14]  Iris Hendrickx,et al.  Modality in Text: a Proposal for Corpus Annotation , 2012, LREC.

[15]  Christoph Müller,et al.  Multi-level annotation of linguistic data with MMAX 2 , 2006 .

[16]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[17]  Sergei Nirenburg,et al.  Semantically Rich Human-Aided Machine Annotation , 2005, FCA@ACL.

[18]  Christine D. Piatko,et al.  Statistical Modality Tagging from Rule-based Annotations and Crowdsourcing , 2012, ExProM@ACL.

[19]  Iris Hendrickx,et al.  Annotating the Interaction between Focus and Modality: the case of exclusive particles , 2013, LAW@ACL.