Open Mind Word Expert: Creating Large Annotated Data Collections with Web Users’ Help

Open Mind Word Expert is an implemented active learning system that aims to create large annotated corpora by tapping into the world’s vast pool of knowledge. It does this by relying on the vast number of Web users who contribute their knowledge to data annotation. Open Mind Word Expert focuses on building semantically annotated corpora, by collecting word sense tagging from the general public over the Web. It is available at http://teachcomputers.org. During the first nine months of activity, the system yielded 90,000 high quality tagged items at a much lower cost than the traditional method of hiring lexicographers.

[1]  Janyce Wiebe,et al.  Word-Sense Disambiguation Using Decomposable Models , 1994, ACL.

[2]  Michele Banko,et al.  Scaling to Very Very Large Corpora for Natural Language Disambiguation , 2001, ACL.

[3]  Shlomo Argamon,et al.  Committee-Based Sampling For Training Probabilistic Classi(cid:12)ers , 1995 .

[4]  Jean Carletta,et al.  Assessing Agreement on Classification Tasks: The Kappa Statistic , 1996, CL.

[5]  Adam Kilgarriff,et al.  English Lexical Sample Task Description , 2001, *SEMEVAL.

[6]  Hwee Tou Ng,et al.  Integrating Multiple Knowledge Sources to Disambiguate Word Sense: An Exemplar-Based Approach , 1996, ACL.

[7]  李幼升,et al.  Ph , 1989 .

[8]  George A. Miller,et al.  A Semantic Concordance , 1993, HLT.

[9]  Rada Mihalcea,et al.  Instance Based Learning with Automatic Feature Selection Applied to Word Sense Disambiguation , 2002, COLING.

[10]  Adam Kilgarriff,et al.  95% Replicability for Manual Word Sense Tagging , 1999, EACL.

[11]  Hwee Tou Ng,et al.  Getting Serious about Word Sense Disambiguation , 2002 .

[12]  P. Singh The Public Acquisition of Commonsense Knowledge Push , 2001 .

[13]  Philip G. Edmonds Designing a task for SENSEVAL-2 , 2000 .

[14]  Erik T. Mueller,et al.  Open Mind Common Sense: Knowledge Acquisition from the General Public , 2002, OTM.

[15]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[16]  Adam Kilgarriff,et al.  Gold standard datasets for evaluating word sense disambiguation programs , 1998, Comput. Speech Lang..

[17]  Push Singh,et al.  The Public Acquisition of Commonsense Knowledge , 2002 .