OntoNotes: Sense Pool Verification Using Google N-gram and Statistical Tests

The OntoNotes project has developed a methodology for producing a large multilingual corpus with annotation of predicate-argument structure, word senses, ontology linking, and coreference. The underlying semantic model of OntoNotes involves word senses that are grouped into so-called sense pools, i.e., sets of near-synonymous senses of words. Such information is useful for many applications, including query expansion for information retrieval (IR) systems, (near-)duplicate detection for text summarization systems, and alternative word selection for writing support systems. Once senses have been created and verified by annotation, sense pools are formed by an expert. Verification of sense pools is the topic of this paper. This paper describes a two-stage framework that combines machine and human verification of sense pools. The machine verification acts as a filter to select candidate pool members based on n-gram frequencies obtained from Google and subjected to appropriate statistical measures. The remaining candidates are then passed to humans for final verification. Our experimental results demonstrate that the machine verification can save much human verification work and thus facilitate the development of sense pools.

[1]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[2]  Mitchell P. Marcus,et al.  OntoNotes: The 90% Solution , 2006, NAACL.

[3]  Graeme Hirst,et al.  Building and Using a Lexical Knowledge Base of Near-Synonym Differences , 2006, Computational Linguistics.

[4]  Chung-Hsien Wu,et al.  Domain-specific FAQ retrieval using independent aspects , 2005, TALIP.

[5]  Olga Babko-Malaya,et al.  Different Sense Granularities for Different Applications , 2004, HLT-NAACL 2004.

[6]  Kavi Mahesh,et al.  Ontology Development for Machine Translation: Ideology and Methodology , 1996 .

[7]  Eduard H. Hovy,et al.  Methodologies for the Reliable Construction of Ontological Knowledge , 2005, ICCS.

[8]  Patrick Pantel,et al.  The Omega Ontology , 2005, IJCNLP.

[9]  C. V. Ramamoorthy,et al.  Knowledge and Data Engineering , 1989, IEEE Trans. Knowl. Data Eng..

[10]  Diana Inkpen,et al.  Near-Synonym Choice in an Intelligent Thesaurus , 2007, NAACL.

[11]  Adam Pease,et al.  Towards a standard upper ontology , 2001, FOIS.

[12]  Diana Inkpen A statistical model for near-synonym choice , 2007, TSLP.

[13]  Chung-Hsien Wu,et al.  Semantic segment extraction and matching for Internet FAQ retrieval , 2006, IEEE Transactions on Knowledge and Data Engineering.

[14]  Christiane Fellbaum,et al.  Making fine-grained and coarse-grained sense distinctions, both manually and automatically , 2006, Natural Language Engineering.

[15]  Nicola Guarino,et al.  Sweetening Ontologies with DOLCE , 2002, EKAW.

[16]  Chung-Hsien Wu,et al.  Topic Analysis for Psychiatric Document Retrieval , 2007, ACL.

[17]  Gerd Stumme,et al.  Conceptual Structures: Common Semantics for Sharing Knowledge. Proc. , 2005 .

[18]  Sergei Nirenburg,et al.  Lexical Acquisition with WordNet and the Mikrokosmos Ontology , 1998, WordNet@ACL/COLING.

[19]  James H. Martin,et al.  Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition, 2nd Edition , 2000, Prentice Hall series in artificial intelligence.