Automatic free-text-tagging of online news archives

In this paper, we shall introduce the problem of free-text-tagging of online news archives. From an application point of view, it has many benefits for online news portals and on the other hand, the task has unique characteristics compared to existing approaches for free-text-tagging. We shall describe our system, which was developed for the archive (consisting of 370 thousand articles) of the most visited Hungarian news portal www.origo.hu, along with research questions encountered and solved during our task. As the evaluation of tagging is not straightforward at the end of the project the news company manually investigated the tagging of the automatic system which yielded an F-measure of 71.9.

[1]  M. Naaman,et al.  Position Paper, Tagging, Taxonomy, Flickr, Article, ToRead , 2006 .

[2]  András Kornai,et al.  Hunmorph: Open Source Word Analysis , 2005, ACL 2005.

[3]  Lawrence Birnbaum,et al.  TagAssist: Automatic Tag Suggestion for Blog Posts , 2007, ICWSM.

[4]  Zhiyuan Liu,et al.  Clustering to Find Exemplar Terms for Keyphrase Extraction , 2009, EMNLP.

[5]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[6]  M. Tatu,et al.  RSDC ’ 08 : Tag Recommendations using Bookmark Content , 2008 .

[7]  János Csirik,et al.  Methods and results of the Hungarian WordNet project , 2007 .

[8]  János Csirik,et al.  POS Tagging of Hungarian with Combined Statistical and Rule-Based Methods , 2004, TSD.

[9]  Carl Gutwin,et al.  KEA: practical automatic keyphrase extraction , 1999, DL '99.

[10]  Carl Gutwin,et al.  Domain-Specific Keyphrase Extraction , 1999, IJCAI.

[11]  Maria P. Grineva,et al.  Extracting key terms from noisy and multitheme documents , 2009, WWW '09.

[12]  Peter Mika Ontologies Are Us: A Unified Model of Social Networks and Semantics , 2005, International Semantic Web Conference.

[13]  Simone Paolo Ponzetto,et al.  WikiRelate! Computing Semantic Relatedness Using Wikipedia , 2006, AAAI.

[14]  T. Landauer,et al.  A Solution to Plato's Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge. , 1997 .

[15]  Philip Resnik,et al.  Using Information Content to Evaluate Semantic Similarity in a Taxonomy , 1995, IJCAI.

[16]  Anna Babarczy,et al.  Hunpars : A Rule-based Sentence Parser for Hungarian , 2005 .

[17]  Gilad Mishne,et al.  AutoTag: a collaborative approach to automated tag assignment for weblog posts , 2006, WWW '06.

[18]  András Kocsor,et al.  A Multilingual Named Entity Recognition System Using Boosting and C4.5 Decision Tree Learning Algorithms , 2006, Discovery Science.

[19]  Peter D. Turney Coherent Keyphrase Extraction via Web Mining , 2003, IJCAI.

[20]  Veronika Vincze,et al.  Web-Based Lemmatisation of Named Entities , 2008, TSD.

[21]  Margaret E. I. Kipp,et al.  @toread and Cool : Tagging for Time, Task and Emotion , 2007 .

[22]  Algirdas Avizienis,et al.  Position Paper , 1994, EDCC.