Arabic Script Documents Language Identifications Using Fuzzy ART

The volume of information available on the internet, intranet, digital libraries and newsgroup has increased dramatically in recent years. Therefore, there is a growing interest in helping user better find, filter, and manage these resources. Language identification is the first step of understanding text documents which is written in. It is usually a module within multilingual application. In this paper, we introduce language identification of Arabic script documents by letter frequency. Technique used for identification is fuzzy adaptive resonance theory (ART), which is belong to the neural network architectures that perform incremental unsupervised learning. Arabic script documents such as Arabic, Persian and Urdu were used for performing language identification. From the experiments, we have found that fuzzy ART is particularly promising in terms of accuracy on language identification.

[1]  Stephen Grossberg,et al.  Fuzzy ARTMAP: A neural network architecture for incremental supervised learning of analog multidimensional maps , 1992, IEEE Trans. Neural Networks.

[2]  S. Grossberg,et al.  ART 2: self-organization of stable category recognition codes for analog input patterns. , 1987, Applied optics.

[3]  N. Mikelic,et al.  Language Indentification: How to Distinguish Similar Languages? , 2007, 2007 29th International Conference on Information Technology Interfaces.

[4]  Gail A. Carpenter,et al.  Distributed Learning, Recognition, and Prediction by ART and ARTMAP Neural Networks , 1997, Neural Networks.

[5]  Stephen Grossberg,et al.  ART 2-A: an adaptive resonance algorithm for rapid category learning and recognition , 1991, IJCNN-91-Seattle International Joint Conference on Neural Networks.

[6]  Stephen Grossberg,et al.  ARTMAP: supervised real-time learning and classification of nonstationary data by a self-organizing neural network , 1991, [1991 Proceedings] IEEE Conference on Neural Networks for Ocean Engineering.

[7]  Ali Selamat,et al.  Arabic Script Web Documents Language Identification Using Decision Tree-ARTMAP Model , 2007 .

[8]  Herbert Gish,et al.  Discriminatively trained Language Models using Support Vector Machines for Language Identification , 2006, 2006 IEEE Odyssey - The Speaker and Language Recognition Workshop.

[9]  Li Yu-bin,et al.  On Distributed Learning , 2006 .

[10]  J. Saarinen,et al.  A hybrid neural network/rule based system for bilingual text-to-phoneme mapping , 2004, Proceedings of the 2004 14th IEEE Signal Processing Society Workshop Machine Learning for Signal Processing, 2004..

[11]  Stephen Grossberg,et al.  Fuzzy ART: Fast stable learning and categorization of analog patterns by an adaptive resonance system , 1991, Neural Networks.

[12]  S. Grossberg The Link between Brain Learning, Attention, and Consciousness , 1999, Consciousness and Cognition.

[13]  Bin Ma,et al.  A Vector Space Modeling Approach to Spoken Language Identification , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[14]  Teuvo Kohonen,et al.  Self-Organizing Maps , 2010 .

[15]  Stephen Grossberg,et al.  The ART of adaptive pattern recognition by a self-organizing neural network , 1988, Computer.

[16]  S. Grossberg How does the cerebral cortex work? Development, learning, attention, and 3-D vision by laminar circuits of visual cortex. , 2003, Behavioral and cognitive neuroscience reviews.

[17]  Apostolos Antonacopoulos,et al.  A Robust Braille Recognition System , 2004, Document Analysis Systems.

[18]  Jilei Tian,et al.  n-gram and decision tree based language identification for written words , 2001, IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01..

[19]  F. Rösler,et al.  Event-related potentials during auditory language processing in congenitally blind and sighted people , 2000, Neuropsychologia.

[20]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[21]  Mark J. Embrechts,et al.  Neural networks for text-to-speech phoneme recognition , 2000, Smc 2000 conference proceedings. 2000 ieee international conference on systems, man and cybernetics. 'cybernetics evolving to systems, humans, organizations, and their complex interactions' (cat. no.0.

[22]  Paul McNamee,et al.  Language identification: a solved problem suitable for undergraduate instruction , 2005 .

[23]  Stephen Grossberg,et al.  ART 3: Hierarchical search using chemical transmitters in self-organizing pattern recognition architectures , 1990, Neural Networks.

[24]  E.B. Bilcu,et al.  A Hybrid Neural Network for Language Identification from Text , 2006, 2006 16th IEEE Signal Processing Society Workshop on Machine Learning for Signal Processing.

[25]  Stephen Grossberg,et al.  Art 2: Self-Organization Of Stable Category Recognition Codes For Analog Input Patterns , 1988, Other Conferences.

[26]  Stephen Grossberg,et al.  ART 2-A: An adaptive resonance algorithm for rapid category learning and recognition , 1991, Neural Networks.

[27]  Ali Selamat,et al.  Arabic Script Web Document Language Identifications Using Neural Network , 2007, iiWAS.

[28]  Stephen Grossberg,et al.  A massively parallel architecture for a self-organizing neural pattern recognition machine , 1988, Comput. Vis. Graph. Image Process..

[29]  Mário J. Silva,et al.  Language identification in web pages , 2005, SAC '05.

[30]  티안 질레이,et al.  Scalable neural network-based language identification from written text , 2003 .

[31]  Ibrahim Sogukpinar,et al.  Letter Based Text Scoring Method for Language Identification , 2004, ADVIS.