AN ENGLISH , FRENCH , AND GERMAN VIEW OF THE RUSSIAN INFORMATION AGENCY NOVOSTI NEWS

In this paper we present the application of the SOMLib digital library system to a multilingual document corpus from the Russian Information Agency Novosti. News articles in Russian, English, and German are automatically organized into separate topic hierarchies using a novel unsupervised neural network, namely the Growing Hierarchical Self-Organizing Map. Furthermore, machine translation is used to create a coherent corpus in a single target language. In spite of the “noise” introduced by the automatic translation a consistent topical structuring of the integrated document collection can be created by the neural network. This facilitates straightforward browsing and exploration of multilingual document collections in a given target language.

[1]  Andreas Rauber,et al.  The growing hierarchical self-organizing map , 2000, Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks. IJCNN 2000. Neural Computing: New Challenges and Perspectives for the New Millennium.

[2]  Elsevier Sdol,et al.  Journal of Visual Communication and Image Representation , 2009 .

[3]  Marti A. Hearst,et al.  Reexamining the cluster hypothesis: scatter/gather on retrieval results , 1996, SIGIR '96.

[4]  Teuvo Kohonen,et al.  Self-organized formation of topologically correct feature maps , 2004, Biological Cybernetics.

[5]  Hsin-Chang Yang,et al.  Towards Multilingual Information Discovery through a SOM based Text Mining approach , 2000, PRICAI Workshop on Text and Web Mining.

[6]  Andreas Rauber,et al.  The SOMLib Digital Library System , 1999, ECDL.

[7]  Andreas Rauber,et al.  CIA's View of the World and What Neural Networks Learn from It: A Comparison of Geographical Document Space Representation Metaphors , 1998, DEXA.

[8]  Dalia Guerreiro,et al.  Research and Advanced Technology for Digital Libraries , 1997, Lecture Notes in Computer Science.

[9]  Hsinchun Chen,et al.  Internet Categorization and Search: A Self-Organizing Approach , 1996, J. Vis. Commun. Image Represent..

[10]  Gregory Grefenstette,et al.  Querying across languages: a dictionary-based approach to multilingual information retrieval , 1996, SIGIR '96.

[11]  Andreas Rauber,et al.  Text Classification and Labelling of Document Clusters with Self-Organising Maps , 2000 .

[12]  Eric Saund,et al.  Applying the Multiple Cause Mixture Model to Text Categorization , 1996, ICML.

[13]  Daphne Koller,et al.  Hierarchically Classifying Documents Using Very Few Words , 1997, ICML.

[14]  M. V. Velzen,et al.  Self-organizing maps , 2007 .

[15]  Risto Miikkulainen,et al.  Script Recognition with Hierarchical Feature Maps , 1992 .