Logical labeling of Arabic newspapers using artificial neural nets

Logical structure analysis is an important phase in the process of document image understanding. In this paper we propose a learning-based method to label logical components on Arabic newspaper documents. The labeling is driven by artificial neural nets. Each one is specialized in a document class. The first prototype of LUNET has been tested on a set of Arabic newspapers of three document classes. Some promising experimental results are reported.

[1]  Rolf Ingold,et al.  Modeling documents for structure recognition using generalized N-grams , 1997, Proceedings of the Fourth International Conference on Document Analysis and Recognition.

[2]  Sargur N. Srihari,et al.  Knowledge-based derivation of document logical structure , 1995, Proceedings of 3rd International Conference on Document Analysis and Recognition.

[3]  David S. Doermann,et al.  Logical Labeling of Document Images Using Layout Graph Matching with Adaptive Learning , 2002, Document Analysis Systems.

[4]  Karim Hadjar,et al.  Arabic newspaper page segmentation , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[5]  Yannis A. Dimitriadis,et al.  Structured document labeling and rule extraction using a new recurrent fuzzy-neural system , 1999, Proceedings of the Fifth International Conference on Document Analysis and Recognition. ICDAR '99 (Cat. No.PR00318).

[6]  Marc Parizeau,et al.  Logical labeling using Bayesian networks , 2001, Proceedings of Sixth International Conference on Document Analysis and Recognition.

[7]  Karim Hadjar,et al.  Configuration REcognition Model for Complex Reverse Engineering Methods: 2(CREM) , 2002, Document Analysis Systems.

[8]  Tao Hu,et al.  A Mixed Approach Toward an Efficient Logical Structure Recognition from Document Images , 1993, Electron. Publ..

[9]  Karim Hadjar,et al.  Physical Layout Analysis of Complex Structured Arabic Documents Using Artificial Neural Nets , 2004, Document Analysis Systems.

[10]  S. Tsujimoto,et al.  Understanding multi-articled documents , 1990, [1990] Proceedings. 10th International Conference on Pattern Recognition.

[11]  Philip D. Wasserman,et al.  Neural computing - theory and practice , 1989 .