论文信息 - Automatic Document Categorisation by User Profile in Medline

Automatic Document Categorisation by User Profile in Medline

We investigate potential improvements to the problem of term extraction related to document representation and indexing in large document collections such as Medline, the premier bibliographic database of the U.S. National Library of Medicine (NLM). Using term extraction methods such as AMTEX and MMTX, document representations are semantically compact and more efficient, being reduced to a limited number of meaningful multi-word terms (phrases), rather than large vectors of single-words, part of which may be void of distinctive content semantics. We show how this information can be used for the automatic categorisation of medical documents by user profile (i.e., novice users and experts). This is achieved by mapping document terms to external lexical resources such as WordNet, and MeSH (the medical thesaurus of NLM). Evaluation results of all methods are presented and discussed.

Euripides G. M. Petrakis | Angelos Hliaoutakis

[1] Euripides G. M. Petrakis,et al. Unsupervised Ontology Acquisition from Plain Texts: The OntoGain System , 2010, NLDB.

[2] Euripides G. M. Petrakis,et al. The AMTEx approach in the medical document indexing and retrieval application , 2009, Data Knowl. Eng..

[3] Olivier Bodenreider,et al. Exploring semantic groups through visual approaches , 2003, J. Biomed. Informatics.

[4] David McLean,et al. An Approach for Measuring Semantic Similarity between Words Using Multiple Information Sources , 2003, IEEE Trans. Knowl. Data Eng..

[5] Sophia Ananiadou,et al. Trucks: a model for automatic multiword term recognition , 2001 .

[6] Hideki Mima,et al. Automatic recognition of multi-word terms:. the C-value/NC-value method , 2000, International Journal on Digital Libraries.