Text Mining for Documents Annotation and Ontology Support

This paper presents a survey of basic concepts in the area of text data mining and some of the methods used in order to elicit useful knowledge from collections of textual data. Three different text data mining techniques (clustering/visualisation, association rules and classification models) are analysed and its exploitation possibilities within the Webocracy project are showed. Clustering and association rules discovery are well suited as supporting tools for ontology management. Classification models are used for automatic documents annotation.

[1]  Johannes Fürnkranz,et al.  Incremental Reduced Error Pruning , 1994, ICML.

[2]  Andreas Rauber,et al.  Cluster Analysis as a First Step in the Knowledge Discovery Process , 2000, J. Adv. Comput. Intell. Intell. Informatics.

[3]  Y. Kodratoff Rating the interest of rules induced from data and within texts , 2001, 12th International Workshop on Database and Expert Systems Applications.

[4]  Andreas Rauber LabelSOM: on the labeling of self-organizing maps , 1999, IJCNN'99. International Joint Conference on Neural Networks. Proceedings (Cat. No.99CH36339).

[5]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[6]  William W. Cohen Fast Effective Rule Induction , 1995, ICML.

[7]  Gerard Salton,et al.  A vector space model for automatic indexing , 1975, CACM.

[8]  Petra Perner,et al.  Data Mining - Concepts and Techniques , 2002, Künstliche Intell..

[9]  Ah-Hwee Tan,et al.  Text Mining: The state of the art and the challenges , 2000 .

[10]  Teuvo Kohonen,et al.  Self-organized formation of topologically correct feature maps , 2004, Biological Cybernetics.

[11]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[12]  José Palazzo Moreira de Oliveira,et al.  Concept-based knowledge discovery in texts extracted from the Web , 2000, SKDD.

[13]  David D. Lewis,et al.  Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval , 1998, ECML.

[14]  Peter Clark,et al.  The CN2 induction algorithm , 2004, Machine Learning.