A Multi- versus a Single-classifier Approach for the Identification of Modality in the Portuguese Language

This work presents a comparative study between two different approaches to build an automatic classification system for Modality values in the Portuguese language. One approach uses a single multi-class classifier with the full dataset that includes eleven modal verbs; the other builds different classifiers, one for each verb. The performance is measured using precision, recall and F1. Due to the unbalanced nature of the dataset a weighted average approach was calculated for each metric. We use support vector machines as our classifier and experimented with various SVM kernels to find the optimal classifier for the task at hand. We experimented with several different types of feature attributes representing parse tree information and compare these complex feature representation against a simple bag-of-words feature representation as baseline. The best obtained F1 values are above 0.60 and from the results it is possible to conclude that there is no significant difference between both approaches.

[1]  Iris Hendrickx,et al.  Using Syntactic and Semantic Features for Classifying Modal Values in the Portuguese Language , 2016, CICLing.

[2]  Yuji Matsumoto,et al.  Annotating Event Mentions in Text with Modality, Focus, and Source Information , 2010, LREC.

[3]  Amália Mendes,et al.  Towards a Unified Approach to Modality Annotation in Portuguese , 2015 .

[4]  Lillian Lee,et al.  Opinion Mining and Sentiment Analysis , 2008, Found. Trends Inf. Retr..

[5]  Josef Ruppenhofer,et al.  Yes we can!? Annotating English modal verbs , 2012, LREC.

[6]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[7]  Christine D. Piatko,et al.  A Modality Lexicon and its use in Automatic Tagging , 2010, LREC.

[8]  Sophia Ananiadou,et al.  Categorising Modality in Biomedical Texts , 2008, LREC 2008.

[9]  Hans Uszkoreit,et al.  The Portuguese Language in the Digital Age , 2012 .

[10]  Walter Daelemans,et al.  Parameter optimization for machine-learning of word sense disambiguation , 2002, Natural Language Engineering.

[11]  Iris Hendrickx,et al.  Modality in Text: a Proposal for Corpus Annotation , 2012, LREC.

[12]  Amália Mendes,et al.  Modality annotation for Portuguese: from manual annotation to automatic labeling , 2016, LILT.

[13]  Adwait Ratnaparkhi,et al.  A Maximum Entropy Model for Part-Of-Speech Tagging , 1996, EMNLP.

[14]  Heliana Mello,et al.  Challenges in modality annotation in a Brazilian Portuguese Spontaneous Speech Corpus , 2013 .

[15]  James Pustejovsky,et al.  Annotating and Recognizing Event Modality in Text , 2006, FLAIRS.