Context and Keyword Extraction in Plain Text Using a Graph Representation

Document indexation is an essential task achieved by archivists or automatic indexing tools. To retrieve relevant documents to a query, keywords describing this document have to be carefully chosen. Archivists have to find out the right topic of a document before starting to extract the keywords. For an archivist indexing specialized documents, experience plays an important role. But indexing documents on different topics is much harder. This article proposes an innovative method for an indexing support system. This system takes as input an ontology and a plain text document and provides as output contextualized keywords of the document. The method has been evaluated by exploiting Wikipedia's category links as a termino-ontological resources.

[1]  Asunción Gómez-Pérez,et al.  METHONTOLOGY: From Ontological Art Towards Ontological Engineering , 1997, AAAI 1997.

[2]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[3]  Philip J. Hayes,et al.  CONSTRUE/TIS: A System for Content-Based Indexing of a Database of News Stories , 1990, IAAI.

[4]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[5]  George A. Miller,et al.  WordNet: A Lexical Database for the English Language , 2002 .

[6]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[7]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[8]  J. Davenport Editor , 1960 .

[9]  Ellen Riloff,et al.  Automatically Constructing a Dictionary for Information Extraction Tasks , 1993, AAAI.

[10]  Nathalie Aussenac-Gilles,et al.  Structuration de terminologies à l'aide d'outils de TAL avec TERMINAE , 2002 .

[11]  Maria Teresa Pazienza Information Extraction: Towards Scalable, Adaptable Systems , 1999 .

[12]  Michael Uschold,et al.  The Enterprise Ontology , 1998, The Knowledge Engineering Review.

[13]  Fabio Ciravegna,et al.  (LP) 2 , an Adaptive Algorithm for Information Extraction from Web-related Texts , 2001 .

[14]  Kalina Bontcheva,et al.  SVM Based Learning System for Information Extraction , 2004, Deterministic and Statistical Methods in Machine Learning.

[15]  Zijian Zheng,et al.  Naive Bayesian Classifier Committees , 1998, ECML.

[16]  Kazem Taghva,et al.  Address extraction using hidden Markov models , 2005, IS&T/SPIE Electronic Imaging.

[17]  Thomas R. Gruber,et al.  A translation approach to portable ontology specifications , 1993, Knowl. Acquis..

[18]  Stéfan Jacques Darmoni,et al.  CISMeF: a structured Health resource guide for healthcare professionals and patients , 2000, RIAO.

[19]  Nathalie Aussenac-Gilles,et al.  Revisiting Ontology Design: A Methodology Based on Corpus Analysis , 2000, EKAW.

[20]  Tim Leek,et al.  Information Extraction Using Hidden Markov Models , 1997 .

[21]  Maria Teresa Pazienza,et al.  Information Extraction A Multidisciplinary Approach to an Emerging Information Technology , 1997, Lecture Notes in Computer Science.