NEER: An Unsupervised Method for Named Entity Evolution Recognition

High impact events, political changes and new technologies are reflected in our language and lead to constant evolution of terms, expressions and names. Not knowing about names used in the past for referring to a named entity can severely decrease the performance of many computational linguistic algorithms. We propose NEER, an unsupervised method for named entity evolution recognition independent of external knowledge sources. We find time periods with high likelihood of evolution. By analyzing only these time periods using a sliding window co-occurrence method we capture evolving terms in the same context. We thus avoid comparing terms from widely different periods in time and overcome a severe limitation of existing methods for named entity evolution, as shown by the high recall of 90% on the New York Times corpus. We compare several relatedness measures for filtering to improve precision. Furthermore, using machine learning with minimal supervision improves precision to 94%.

[1]  Klaus U. Schulz,et al.  Towards information retrieval on historical document collections: the role of matching procedures and special lexica , 2010, International Journal on Document Analysis and Recognition (IJDAR).

[2]  Wei Shen,et al.  LINDEN: linking named entities with knowledge base via semantic knowledge , 2012, WWW.

[3]  Claudia Niederée,et al.  On-the-fly entity-aware query processing in the presence of linkage , 2010, Proc. VLDB Endow..

[4]  Gerhard Weikum,et al.  Bridging the Terminology Gap in Web Archive Search , 2009, WebDB.

[5]  Klaus U. Schulz,et al.  Information Access to Historical Documents from the Early New High German Period , 2006, Digital Historical Corpora.

[6]  Jon M. Kleinberg,et al.  Bursty and Hierarchical Structure in Streams , 2002, Data Mining and Knowledge Discovery.

[7]  Norbert Fuhr,et al.  Retrieval in text collections with historic spelling using linguistic and spelling variants , 2007, JCDL '07.

[8]  Gerhard Weikum,et al.  Incorporating terminology evolution for query translation in text retrieval with association rules , 2010, CIKM '10.

[9]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[10]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[11]  Klaus U. Schulz,et al.  Enabling information retrieval on historical document collections: the role of matching procedures and special lexica , 2009, AND '09.

[12]  Christopher D. Manning,et al.  Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling , 2005, ACL.

[13]  Nitis Mukhopadhyay,et al.  Correlation Coefficient , 2011, International Encyclopedia of Statistical Science.

[14]  Thomas Risse,et al.  Towards automatic language evolution tracking A study on word sense tracking , 2011 .

[15]  Kjetil Nørvåg,et al.  Exploiting time-based synonyms in searching document archives , 2010, JCDL '10.

[16]  Evgeniy Gabrilovich,et al.  A word at a time: computing word relatedness using temporal semantic analysis , 2011, WWW.