论文信息 - USING WEKA FRAMEWORK IN DOCUMENT CLASSIFICATION

USING WEKA FRAMEWORK IN DOCUMENT CLASSIFICATION

Text document classification problem is a special case of a supervised data mining problem. In order to solve a text document classification problem some steps are required to fulfill. The common steps are: feature extraction, feature selection, classification, evaluation and visualization. The WEKA is a framework that helps us with all these steps. WEKA was initially developed as a library of java classes that help us to implement data mining applications. In the last years, in order to avoid java programming skills, the components from WEKA are also available into a visual form inside “WEKA Knowledge Flow Environment”. We have studied and present in this paper some of the most important visual components that are available in the WEKA framework for the previously presented steps. These components are: “Arff Loader”, “Attribute Selection”, “Normalize”, “Train Test Split Maker”, a lot of classifier algorithms, “Performance Evaluator” and “Text Viewer”. In order to prove the functionality of the visual framework in text document classification we have made and present some experiments. The most important advantage of the visual WEKA framework is the possibility to test different approaches without programming abilities.

Daniel Morariu | Radu Crețulescu | Macarie Breazu