Classifying Amharic News Text Using Self-Organizing Maps

The paper addresses using artificial neural networks for classification of Amharic news items. Amharic is the language for countrywide communication in Ethiopia and has its own writing system containing extensive systematic redundancy. It is quite dialectally diversified and probably representative of the languages of a continent that so far has received little attention within the language processing field. The experiments investigated document clustering around user queries using Self-Organizing Maps, an unsupervised learning neural network strategy. The best ANN model showed a precision of 60.0% when trying to cluster unseen data, and a 69.5% precision when trying to classify it.

[1]  Thomas Hofmann,et al.  Text categorization by boosting automatically extracted concepts , 2003, SIGIR.

[2]  Tewodros Hailemeskel,et al.  Amharic Text retrieval: an Experiment using Latent Semantic Indexing (LSI) with Singular Value Decomposition (SVD , 2003 .

[3]  Atelach Alemu Argaw,et al.  Dictionary-based Amharic - English Information Retrieval , 2004, CLEF.

[4]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[5]  Susan T. Dumais,et al.  Improving the retrieval of information from external sources , 1991 .

[6]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[7]  Stella Markantonatou,et al.  Applying the SOM Model to Text Classification According to Register and Stylistic Content , 2003, Int. J. Neural Syst..

[8]  Hwee Tou Ng,et al.  Feature selection, perceptron learning, and a usability case study for text categorization , 1997, SIGIR '97.

[9]  Teuvo Kohonen,et al.  Self-organization and associative memory: 3rd edition , 1989 .

[10]  Peter Willett,et al.  Stemming of Amharic Words for Information Retrieval , 2002, Lit. Linguistic Comput..

[11]  Daniel Yacob TALN 2005 Developments Towards an Electronic Amharic Corpus , 2005 .

[12]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorization , 2001, CSUR.

[13]  Teuvo Kohonen,et al.  Self-Organization and Associative Memory , 1988 .

[14]  Fiaz Hussain,et al.  Amharic character recognition using a fast signature based algorithm , 2003, Proceedings on Seventh International Conference on Information Visualization, 2003. IV 2003..

[15]  Padmini Srinivasan,et al.  Hierarchical neural networks for text categorization , 1999, SIGIR 1999.

[16]  Samuel Kaski,et al.  Self organization of a massive document collection , 2000, IEEE Trans. Neural Networks Learn. Syst..

[17]  Padmini Srinivasan,et al.  Hierarchical neural networks for text categorization (poster abstract) , 1999, SIGIR '99.

[18]  Gary Marchionini,et al.  A self-organizing semantic map for information retrieval , 1991, SIGIR '91.

[19]  Timo Honkela,et al.  WEBSOM - Self-organizing maps of document collections , 1998, Neurocomputing.

[20]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[21]  Susan T. Dumais,et al.  Using Linear Algebra for Intelligent Information Retrieval , 1995, SIAM Rev..

[22]  Gerard Salton,et al.  Research and Development in Information Retrieval , 1982, Lecture Notes in Computer Science.