Distributed boosting algorithm for classification of text documents

Presented paper focuses on the area of analysis and classification of textual documents. We present the classification of documents based on boosting method applied on the decision tree algorithm. Main objective of the paper is to present the implementation of distributed boosting algorithm based on Map Reduce paradigm. We have used the GridGain framework as a platform for distributed data processing and have tested the implemented solution on two different dataset within our testing environment.

[1]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[2]  Ladislav Hluchý,et al.  Meteorological Phenomena Forecast Using Data Mining Prediction Methods , 2011, ICCCI.

[3]  Yoram Singer,et al.  Improved Boosting Algorithms Using Confidence-rated Predictions , 1998, COLT' 98.

[4]  Chng Eng Siong,et al.  Hadoop framework: impact of data organization on performance , 2013, Softw. Pract. Exp..

[5]  Frantisek Babic,et al.  Mirroring of Knowledge Practices based on User-defined Patterns , 2011, J. Univers. Comput. Sci..

[6]  Tong Zhang,et al.  Text Mining: Predictive Methods for Analyzing Unstructured Information , 2004 .

[7]  Joseph Dichy,et al.  The Automatic Categorization of Arabic Documents by Boosting Decision Trees , 2009, 2009 Fifth International Conference on Signal Image Technology and Internet Based Systems.

[8]  J. Ross Quinlan Learning First-Order Definitions of Functions , 1996, J. Artif. Intell. Res..

[9]  M. Samovsky,et al.  Cloud-based classification of text documents using the Gridgain platform , 2012, 2012 7th IEEE International Symposium on Applied Computational Intelligence and Informatics (SACI).

[10]  A. B. M. Shawkat Ali,et al.  Improved C4.5 algorithm for rule based classification , 2010 .

[11]  Martin Sarnovsky,et al.  Cloud-based clustering of text documents using the GHSOM algorithm on the GridGain platform , 2013, 2013 IEEE 8th International Symposium on Applied Computational Intelligence and Informatics (SACI).

[12]  P. Butka,et al.  Comparison of standard and sparse-based implementation of GOSCL algorithm , 2012, 2012 IEEE 13th International Symposium on Computational Intelligence and Informatics (CINTI).

[13]  Peter Butka,et al.  Distributed Version of Algorithm for Generalized One-Sided Concept Lattices , 2013, IDC.

[14]  Peter Butka,et al.  Generalization of One-Sided Concept Lattices , 2013, Comput. Informatics.