Building a Greek corpus for Textual Entailment

The paper reports on completed work aimed at the creation of a resource, namely, the Greek Textual Entailment Corpus (GTEC) that is appropriate for guiding training and evaluation of a system that recognizes Textual Entailment in Greek texts. The corpus of textual units was collected in view of a range of NLP applications, where semantic interpretation is of paramount importance, and it was manually annotated at the level of Textual Entailment. Moreover, a number of linguistic annotations were also integrated that were deemed useful for prospect system developers. The critical issue was the development of a final resource that is re-usable and adaptable to different NLP systems, in order to either enhance their accuracy or to evaluate their output. We are hereby focusing on the methodological issues underpinning data selection and annotation. An initial approach towards the development of a system catering for the automatic Recognition of Textual Entailment in Greek is also presented and preliminary results are reported.

[1]  Stelios Piperidis,et al.  Multi-level XML-based Corpus Annotation , 2002, LREC.

[2]  Voula Giouli,et al.  Multi-domain Multi-lingual Named Entity Recognition: Revisiting & Grounding the resources issue , 2006, LREC.

[3]  Garry Thompson 2003 Issue No. 07 — New Media Technologies Weblogs, warblogs, the public sphere, and bubbles , 2003 .

[4]  L. Ferro,et al.  MITRE ’ s Submissions to the EU Pascal RTE Challenge , 2005 .

[5]  Shalom Lappin,et al.  An Algorithm for Pronominal Anaphora Resolution , 1994, CL.

[6]  Kazuhiko Kato,et al.  Extracting Topics From Weblogs Through Frequency Segments , 2006 .

[7]  Matthew Hurst,et al.  BlogPulse: Automated Trend Discovery for Weblogs , 2003 .

[8]  John D. Burger,et al.  An Exploration of Observable Features Related to Blogger Age , 2006, AAAI Spring Symposium: Computational Approaches to Analyzing Weblogs.

[9]  Stelios Piperidis,et al.  Theoretical and Practical Issues in the Construction of a Greek Dependency Treebank , 2005 .

[10]  Ido Dagan,et al.  The Third PASCAL Recognizing Textual Entailment Challenge , 2007, ACL-PASCAL@ACL.

[11]  Hsin-Hsi Chen,et al.  Analyzing Temporal Collocations in Weblogs , 2007, ICWSM.

[12]  Hsin-Hsi Chen,et al.  Detection of Bloggers' Interests: Using Textual, Temporal, and Interactive Features , 2006, 2006 IEEE/WIC/ACM International Conference on Web Intelligence (WI 2006 Main Conference Proceedings)(WI'06).

[13]  Chao Liu,et al.  A probabilistic approach to spatiotemporal theme pattern mining on weblogs , 2006, WWW '06.

[14]  Matthew F. Hurst 24 Hours in the Blogosphere , 2006, AAAI Spring Symposium: Computational Approaches to Analyzing Weblogs.

[15]  Roy Bar-Haim,et al.  The Second PASCAL Recognising Textual Entailment Challenge , 2006 .

[16]  Stelios Piperidis,et al.  A Unified POS Tagging Architecture and its Application to Greek , 2000, LREC.

[17]  Claire Grover,et al.  In Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC , 2006 .

[18]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[19]  Christine Doran,et al.  Highlights from 12 Months of Blogs , 2006, AAAI Spring Symposium: Computational Approaches to Analyzing Weblogs.

[20]  Joakim Nivre,et al.  MaltParser: A Data-Driven Parser-Generator for Dependency Parsing , 2006, LREC.

[21]  Lada A. Adamic,et al.  The political blogosphere and the 2004 U.S. election: divided they blog , 2005, LinkKDD '05.

[22]  M. Thelwall Bloggers during the London attacks: Top information sources and topics , 2006 .