Grid-based Support for Different Text Mining Tasks

This paper provides an overview of our research activities aimed at efficient use of Grid infrastructure to solve various text mining tasks. Grid-enabling of various text mining tasks was mainly driven by increasing volume of processed data. Utilizing the Grid services approach therefore enables to perform various text mining scenarios and also open ways to design distributed modifications of existing methods. Especially, some parts of mining process can significantly benefit from decomposition paradigm, in particular in this study we present our approach to data-driven decomposition of decision tree building algorithm, clustering algorithm based on self-organizing maps and its application in conceptual model building task using the FCA-based algorithm. Work presented in this paper is rather to be considered as a 'proof of concept' for design and implementation of decomposition methods as we performed the experiments mostly on standard textual databases.

[1]  Jan. Paralic,et al.  Java Library for Support of Text Mining and Retrieval , .

[2]  J. Ross Quinlan Learning First-Order Definitions of Functions , 1996, J. Artif. Intell. Res..

[3]  M. V. Velzen,et al.  Self-organizing maps , 2007 .

[4]  L. Beran,et al.  [Formal concept analysis]. , 1996, Casopis lekaru ceskych.

[5]  Ami Marowka,et al.  The GRID: Blueprint for a New Computing Infrastructure , 2000, Parallel Distributed Comput. Pract..

[6]  Sholom M. Weiss,et al.  Towards language independent automated learning of text categorization models , 1994, SIGIR '94.

[7]  A M. Tjoa,et al.  GridMiner : A Framework for Knowledge Discovery on the Grid-from a Vision to Design and Implementation , 2005 .

[8]  Siu Cheung Hui,et al.  A Fuzzy FCA-based Approach to Conceptual Clustering for Automatic Generation of Concept Hierarchy on Uncertainty Data , 2004, CLA.

[9]  Hans Peter Luhn,et al.  A Statistical Approach to Mechanized Encoding and Searching of Literary Information , 1957, IBM J. Res. Dev..

[10]  Ivan Janciak,et al.  Distributed Classification of Textual Documents on the Grid , 2006, HPCC.

[11]  P. Butka,et al.  One approach to combination of FCA-based local conceptual models for text analysis — grid-based approach , 2008, 2008 6th International Symposium on Applied Machine Intelligence and Informatics.

[12]  Andreas Rauber,et al.  The growing hierarchical self-organizing map , 2000, Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks. IJCNN 2000. Neural Computing: New Challenges and Perspectives for the New Millennium.