The advancement of the present day technology enables the production of huge amount of information. Retrieving useful information out of these huge collections necessitates proper organization and structuring. Automatic text classification is an inevitable solution in this regard. However, the present approach focuses on the flat classification, where each topic is treated as a separate class, which is inadequate in text classification where there are a large number of classes and a huge number of relevant features needed to distinguish between them. This paper aimed to explore the use of hierarchical structure for classifying a large, heterogeneous collection of Amharic News Text. The approach utilizes the hierarchical topic structure to decompose the classification task into a set of simpler problems, one at each node in the classification tree. An experiment had been conducted using a categorical data collected from Ethiopian News Agency (ENA) using SVM to see the performances of the hierarchical classifiers on Amharic News Text. The findings of the experiment show the accuracy of flat classification decreases as the number of classes and documents (features) increases. Moreover, the accuracy of the flat classifier decreases at an increasing number of top feature set. The peak accuracy of the flat classifier was 68.84 % when the top 3 features were used. The findings of the experiment done using hierarchical classification show an increasing performance of the classifiers as we move down the hierarchy. The maximum accuracy achieved was 90.37% at level-3(last level) of the category tree. Moreover, the accuracy of the hierarchical classifiers increases at an increasing number of top feature set compared to the flat classifier. The peak accuracy was 89.06% using level three classifier when the top 15 features were used. Furthermore, the performance between flat classifier and hierarchical classifiers are compared using the same test data. Thus, it shows that use of the hierarchical structure during classification has resulted in a significant improvement of 29.42 % in exact match precision when compared with a flat classifier.
[1]
Yuchen Fu,et al.
Application of an integrated support vector regression method in prediction of financial returns
,
2011
.
[2]
R. Suganya,et al.
Data Mining Concepts and Techniques
,
2010
.
[3]
Jason D. M. Rennie.
Improving multi-class text classification with Naive Bayes
,
2001
.
[4]
Jiawei Han,et al.
Data Mining: Concepts and Techniques
,
2000
.
[5]
Le Hoang Thai,et al.
Image Classification using Support Vector Machine and Artificial Neural Network
,
2012
.
[6]
Dino Isa,et al.
Reducing Support Vector Machine Classification Error by Implementing Kalman Filter
,
2013
.
[7]
Surafel Teklu.
Automatic Categorization Of Amharic News Text: A Machine Learning Approach
,
2012
.
[8]
Daphne Koller,et al.
Hierarchically Classifying Documents Using Very Few Words
,
1997,
ICML.
[9]
Jianxin Li,et al.
Text Classification Using Lifelong Machine Learning
,
2017,
ICONIP.
[10]
Aaron Kershenbaum,et al.
The Effect of Using Hierarchical Classifiers in Text Categorization
,
2000,
RIAO.
[11]
Fabrizio Sebastiani,et al.
Machine learning in automated text categorization
,
2001,
CSUR.
[12]
Thorsten Joachims,et al.
Text Categorization with Support Vector Machines: Learning with Many Relevant Features
,
1998,
ECML.