论文信息 - Symbolic word clustering for medium-size corpora

Symbolic word clustering for medium-size corpora

When trying to identify essential concepts and relationships in a medium-size corpus, it is not always possible to rely on statistical methods, as the frequencies are too low. We present an alternative method, symbolic, based on the simplification of parse trees. We discuss the results on nominal phrases of two technical corpora, analyzed by two different robust parsers used for terminology updating in an industrial company. We compare our results with Hindle's scores of similarity.

[1] P Zweigenbaum,et al. MENELAS: an access system for medical records using natural language. , 1994, Computer methods and programs in biomedicine.

[2] Kenneth Ward Church,et al. Word Association Norms, Mutual Information, and Lexicography , 1989, ACL.

[3] Ralph Grishman,et al. Generalizing Automatically Generated Selectional Patterns , 1994, COLING.

[4] Roberto Basili,et al. A "not-so-shallow" parser for collocational analysis , 1994, COLING.

[5] Gregory Grefenstette,et al. Explorations in automatic thesaurus discovery , 1994 .

[6] Benoît Habert,et al. Simplifier des arbres d'analyse pour dégager les comportements syntactico-sémantiques des formes d'un corpus , 1995 .

[7] Alan F. Smeaton,et al. Using morpho-syntactic language analysis in phrase matching , 1991, RIAO.

[8] Donald Hindle,et al. Noun Classification From Predicate-Argument Structures , 1990, ACL.

[9] Stephanie W. Haas,et al. The constituent object parser: syntactic structure matching for information retrieval , 1989, SIGIR '89.

[10] Frank Smadja,et al. Retrieving Collocations from Text: Xtract , 1993, CL.

[11] Didier Bourigault,et al. An Endogeneous Corpus-Based Method for Structural Noun Phrase Disambiguation , 1993, EACL.