Unsupervised Learning of Generalized Names

We present an algorithm, NOMEN, for learning generalized names in text. Examples of these are names of diseases and infectious agents, such as bacteria and viruses. These names exhibit certain properties that make their identification more complex than that of regular proper names, NOMEN uses a novel form of bootstrapping to grow sets of textual instances and of their contextual patterns. The algorithm makes use of competing evidence to boost the learning of several categories of names simultaneously. We present results of the algorithm on a large corpus. We also investigate the relative merits of several evaluation strategies.

[1]  Fabio Ciravegna,et al.  (LP) 2 , an Adaptive Algorithm for Information Extraction from Web-related Texts , 2001 .

[2]  David Yarowsky,et al.  Language Independent Named Entity Recognition Combining Morphological and Contextual Evidence , 1999, EMNLP.

[3]  Slava M. Katz,et al.  Technical terminology: some linguistic properties and an algorithm for identification in text , 1995, Natural Language Engineering.

[4]  Yorick Wilks,et al.  Evaluation of an Algorithm for the Recognition and Classification of Proper Names , 1996, COLING.

[5]  Ellen Riloff,et al.  Learning Dictionaries for Information Extraction by Multi-Level Bootstrapping , 1999, AAAI/IAAI.

[6]  Ralph Grishman,et al.  Real-time event extraction for infectious disease outbreaks , 2002 .

[7]  Hideki Mima,et al.  Automatic recognition of multi-word terms:. the C-value/NC-value method , 2000, International Journal on Digital Libraries.

[8]  Jin Wang,et al.  A Self-Learning Universal Concept Spotter , 1996, COLING.

[9]  Ralph Grishman,et al.  Exploiting Diverse Knowledge Sources via Maximum Entropy in Named Entity Recognition , 1998, VLC@COLING/ACL.

[10]  Yoram Singer,et al.  Unsupervised Models for Named Entity Classification , 1999, EMNLP.

[11]  Fabio Ciravegna,et al.  Adaptive Information Extraction from Text by Rule Induction and Generalisation , 2001, IJCAI.

[12]  Ralph Grishman,et al.  Automatic Acquisition of Domain Knowledge for Information Extraction , 2000, COLING.

[13]  Akira Ushioda,et al.  Hierarchical Clustering of Words , 1996, COLING.

[14]  Richard M. Schwartz,et al.  Nymble: a High-Performance Learning Name-finder , 1997, ANLP.

[15]  Roman Yangarber,et al.  Acquisition of Domain Knowledge , 2002, SCIE.