A Methodology for Terminology-based Knowledge Acquisition and Integration

In this paper we propose an integrated knowledge management system in which terminology-based knowledge acquisition, knowledge integration, and XML-based knowledge retrieval are combined using tag information and ontology management tools. The main objective of the system is to facilitate knowledge acquisition through query answering against XML-based documents in the domain of molecular biology. Our system integrates automatic term recognition, term variation management, context-based automatic term clustering, ontology-based inference, and intelligent tag information retrieval. Tag-based retrieval is implemented through interval operations, which prove to be a powerful means for textual mining and knowledge acquisition. The aim is to provide efficient access to heterogeneous biological textual data and databases, enabling users to integrate a wide range of textual and non-textual resources effortlessly.

[1]  Kyo Kageura,et al.  The construction of a lexically motivated corpus: the problem of defining lexical unit , 1998, LREC.

[2]  Evelyne Tzoukermann,et al.  NLP for Term Variant Extraction: Synergy Between Morphology, Lexicon, and Syntax , 1999 .

[3]  Jun'ichi Tsujii,et al.  The LiLFeS Abstract Machine and its evaluation with the LinGO grammar , 2000, Nat. Lang. Eng..

[4]  Hitoshi Iida,et al.  Document Retrieval Method Using Semantic Similarity and Word Sense Disambiguation , 1997 .

[5]  Huan Liu,et al.  Resource description framework: metadata and its applications , 2001, SKDD.

[6]  Hideki Mima,et al.  A Web-based integrated knowledge mining aid system using term-oriented Natural Language Processing , 1999 .

[7]  Goran Nenadic,et al.  Automatic Acronym Acquisition and Term Variation Management within Domain-Specific Texts , 2002, LREC.

[8]  Kyo Kageura,et al.  The Construction of a Lexically Motivated Corpus | the Problem with Dening Lexical Units | , 1998 .

[9]  Dan Brickley,et al.  Resource description framework (RDF) schema specification , 1998 .

[10]  L. Brooke The National Library of Medicine. , 1980, Hospital libraries.

[11]  Hideki Mima,et al.  An Application and Evaluation of the C/NC-value Approach for the Automatic term Recognition of Multi-Word units in Japanese , 2000 .

[12]  Akira Ushioda,et al.  Hierarchical Clustering of Words , 1996, COLING.

[13]  Carole A. Goble,et al.  TAMBIS: Transparent Access to Multiple Bioinformatics Information Sources , 1998, ISMB.

[14]  Hideki Mima,et al.  Automatic recognition of multi-word terms:. the C-value/NC-value method , 2000, International Journal on Digital Libraries.

[15]  Hideki Mima,et al.  The ATRACT Workbench: Automatic Term Recognition and Clustering for Terms , 2001, TSD.