Managing the knowledge contained in electronic documents: a clustering method for text mining

The huge amount of unstructured data available on the Web and the intranets creates an information overloading problem. So, managing the knowledge contained in the textual documents is an important problem of Knowledge Management. Knowledge Extraction from collections of data is possible by Knowledge Discovery in Database (KDD), an interactive and iterative process focused on the exploration of data to discover new and interesting patterns within them. The fundamental phase of KDD process is Data Mining if data are in structured form and Text Mining when they are unstructured. This paper describes a prototype of a vertical corporate portal that implements a KDD process for knowledge extraction from unstructured data contained in textual documents. Text mining is realized through a clustering method that produces a partition of a set of documents on the basis of their contents characterized through the frequency of the words.

[1]  P. Willett,et al.  Using interdocument similarity information in document retrieval systems , 1997, J. Am. Soc. Inf. Sci..

[2]  Ellen Riloff,et al.  Information extraction as a basis for high-precision text classification , 1994, TOIS.

[3]  I. Nonaka A Dynamic Theory of Organizational Knowledge Creation , 1994 .

[4]  Yonatan Aumann,et al.  Knowledge Management: A Text Mining Approach , 1998, PAKM.

[5]  B. S. Duran,et al.  Cluster Analysis: A Survey , 1974 .

[6]  Gregory Piatetsky-Shapiro,et al.  Knowledge Discovery in Databases: An Overview , 1992, AI Mag..

[7]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[8]  Mario Lenz Managing the Knowledge Contained in Technical Documents , 1998, PAKM.

[9]  W. Bruce Croft,et al.  Document clustering: An evaluation of some experiments with the cranfield 1400 collection , 1975, Inf. Process. Manag..

[10]  Peter J. Rousseeuw,et al.  Finding Groups in Data: An Introduction to Cluster Analysis , 1990 .

[11]  Mike Davidson,et al.  The Transformation of Management , 1995 .

[12]  John A. Hartigan,et al.  Clustering Algorithms , 1975 .

[13]  Brian Everitt,et al.  Cluster analysis , 1974 .

[14]  Peter Willett,et al.  Using interdocument similarity information in document retrieval systems , 1997 .

[15]  Jochen Dörre,et al.  Text mining: finding nuggets in mountains of textual data , 1999, KDD '99.

[16]  M. Fischetti Working knowledge. , 2003, Scientific American.