Team Association Analysis for Named Entity Filtering

Abstract : This paper describes the participation of the Universities of Helsinki and Caen in the first round of the TREC Knowledge Base Acceleration track3. The task focused on filtering a stream of documents relevant to a set of entities. Our approach uses word co-occurrence graphs for modelling the named entities. We submitted two runs that achieved an average F-measure superior to the mean of all submitted runs. The best of those runs ranked in the top 5 runs for both the central and relevant F-measures, out of a total of 43 runs submitted by 11 institutions. As our runs were the produce of a first implementation of our approach these preliminary results are very supportive of our idea to use concept graphs for modelling named entity relations.

[1]  Heng Ji,et al.  Knowledge Base Population: Successful Approaches and Challenges , 2011, ACL.

[2]  Michael Gamon Graph-Based Text Representation for Novelty Detection , 2006 .

[3]  Valentin I. Spitkovsky,et al.  A Cross-Lingual Dictionary for English Wikipedia Concepts , 2012, LREC.

[4]  Ian Soboro Overview of the TREC 2004 Novelty Track , 2004 .

[5]  Ian Soboroff,et al.  Overview of the TREC 2004 Novelty Track , 2004, TREC.

[6]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorization , 2001, CSUR.

[7]  Hui Fang,et al.  Entity Profile Based Approach in Automatic Knowledge Finding , 2012, TREC.

[8]  Dedre Gentner,et al.  Why Nouns Are Learned before Verbs: Linguistic Relativity Versus Natural Partitioning. Technical Report No. 257. , 1982 .

[9]  Paul McNamee,et al.  The HLTCOE Approach to the TREC 2012 KBA Track , 2012, TREC.

[10]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[11]  Feng Niu,et al.  Building an Entity-Centric Stream Filtering Test Collection for TREC 2012 , 2012, TREC.

[12]  Rada Mihalcea,et al.  Wikify!: linking documents to encyclopedic knowledge , 2007, CIKM '07.

[13]  Ralf Steinberger,et al.  A survey of methods to ease the development of highly multilingual text mining applications , 2011, Language Resources and Evaluation.

[14]  Andrew Trotman,et al.  Focused Access to XML Documents, 6th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2007, Dagstuhl Castle, Germany, December 17-19, 2007. Selected Papers , 2008, INEX.

[15]  Ted Dunning,et al.  Accurate Methods for the Statistics of Surprise and Coincidence , 1993, CL.

[16]  Arjen P. de Vries,et al.  CWI at TREC 2012, KBA Track and Session Track , 2012, TREC.