TM-Gen: A Topic Map Generator from Text Documents

The vast amount of text documents stored in digital format is growing at a frantic rhythm each day. Therefore, tools able to find accurate information searching in natural language information repositories are gaining great interest in recent years. In this context, there are especially interesting tools capable of dealing with large amounts of text information and deriving human-readable summaries. However, one step further is to be able not only to summarize, but to extract the knowledge stored in those texts, and even represent it graphically. In this paper we present an architecture to generate automatically a conceptual representation of knowledge stored in a set of text-based documents. For this purpose we have used the topic maps standard and we have developed a method that combines text mining, statistics, linguistic tools, and semantics to obtain a graphical representation of the information contained therein, which can be coded using a knowledge representation language such as RDF or OWL. The procedure is language-independent, fully automatic, self-adjusting, and it does not need manual configuration by the user. Although the validation of a graphic knowledge representation system is very subjective, we have been able to take advantage of an intermediate product of the process to make a experimental validation of our proposals.

[1]  Linda L. Hill,et al.  Core Elements of Digital Gazetteers: Placenames, Categories, and Footprints , 2000, ECDL.

[2]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[3]  Hans Friedrich Witschel,et al.  Merging of Distributed Topic Maps based on the Subject Identity Measure (SIM) Approach , 2004 .

[4]  Eduard H. Hovy,et al.  Identifying Topics by Position , 1997, ANLP.

[5]  Xavier Carreras,et al.  FreeLing: An Open-Source Suite of Language Analyzers , 2004, LREC.

[6]  J. Novak,et al.  A Twelve-Year Longitudinal Study of Science Concept Learning , 1991 .

[7]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[8]  Christian Wolff,et al.  Topic Map Generation Using Text Mining , 2002, J. Univers. Comput. Sci..

[9]  James A. Hendler,et al.  DAML+OIL: An Ontology Language for the Semantic Web , 2002, IEEE Intell. Syst..

[10]  Alejandro Rosete,et al.  Generation of OWL Ontologies from Concept Maps in Shallow Domains , 2007, CAEPIA.

[11]  Thomas R. Gruber,et al.  A translation approach to portable ontology specifications , 1993, Knowl. Acquis..

[12]  Jaime Carbonell,et al.  Multi-Document Summarization By Sentence Extraction , 2000 .

[13]  António Branco,et al.  Extracting Multi-document Summaries with a Double Clustering Approach , 2012, NLDB.

[14]  German Rigau,et al.  Book Reviews: EuroWordNet: A Multilingual Database with Lexical Semantic Networks , 1999, CL.

[15]  Hans Peter Luhn,et al.  The Automatic Creation of Literature Abstracts , 1958, IBM J. Res. Dev..

[16]  László Zsolt Varga,et al.  Framework for semi automatically generating topic maps , 2006 .

[17]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[18]  António Branco,et al.  Using a Double Clustering Approach to Build Extractive Multi-document Summaries , 2012, TSD.

[19]  M. G. Jones,et al.  The concept map as a research and evaluation tool: Further evidence of validity , 1994 .

[20]  Roberto Navigli,et al.  Word sense disambiguation: A survey , 2009, CSUR.

[21]  Cynthia J. Atman,et al.  Concept maps for engineering education: a cognitively motivated tool supporting varied assessment functions , 2000, IEEE Trans. Educ..

[22]  Satoshi Sekine,et al.  Named entities : recognition, classification and use , 2009 .

[23]  Ceriel J. H. Jacobs,et al.  Parsing Techniques - A Practical Guide , 2007, Monographs in Computer Science.

[24]  Deborah L. McGuinness,et al.  OWL Web ontology language overview , 2004 .