Content : The Encoding Component in an Interlingual System for Man-Machine Communication in Natural Language

It could be true that any attempt to build a system to process the content of a given text written in a given language will be faced by tackling language analysis tasks. To reach the semantic representation of any sentence, the system should be enriched with a technique for lexical and syntactic disambiguation. Having finished with semantic representation, the system should be able to re-synthesize the semantic representation into another acceptable sentence in the target language. However, it is not that easy; there are many problems that need to be solved in both the analysis and synthesis processes. To avoid the pitfalls associated with approaches relying on intermediate representations, e.g. syntactic tree, this paper presents an approach on which processing Arabic content, and even the exchange of information among languages, starts directly from a semantic layer without passing through the level of syntax. The approach encodes Arabic structures into a set of semantic relations between a set of nodes representing the elements (words) of the sentence as concepts. Once the concepts are built, the relations between them are determined and can be decoded again to any other language. The grammar for the encoding process is implemented in Universal Networking Language (UNL); it enables computers to understand natural languages which will make it possible for humans to communicate with machines in natural language. Encoding Arabic sentences in terms of semantic networks depends mainly on holding theta roles between different arguments (the participants of the event or situation) included in the sentence. Therefore, the arguments of the predicates of the natural language are classified into a closed set of types which have a different status in the grammar.

[1]  H. Uchida,et al.  The Universal Networking Language beyond Machine Translation , 2001 .

[2]  Sameh Al-Ansary Building a Computational Lexicon for Arabic , 2005 .

[3]  Bonnie J. Dorr,et al.  Machine Translation: A View from the Lexicon , 1994, CL.

[4]  Matthew G. Johnson “Where Do I Speak Into It?” – A Discussion of the Methods and Motivations of Natural Language Processing , 1988, J. Inf. Technol..

[5]  Kimmo Koskenniemi,et al.  A General Computational Model for Word-Form Recognition and Production , 1984, ACL.

[6]  Markus Walther Computational nonlinear morphology with emphasis on semitic languages , 2002, Computational Linguistics.

[7]  J. V. Rauff,et al.  Finite State Morphology , 2007 .

[8]  Harold L. Somers,et al.  An introduction to machine translation , 1992 .

[9]  Noha Adly,et al.  A Framework for the Encoding of Multilayered Documents , 2007, 2006 1st International Conference on Digital Information Management.

[10]  Doug Arnold,et al.  Machine Translation: An Introductory Guide , 1994 .

[11]  Noha Adly,et al.  Generating Arabic text : The Decoding Component in an Interlingual System for Man-Machine Communication in Natural Language , 2006 .

[12]  Eldakar Youssef,et al.  The Million Book Project at Bibliotheca Alexandrina , 2005 .

[13]  Roland R. Hausser Foundations of Computational Linguistics: Man-Machine Communication in Natural Language , 1999 .

[14]  Iman Saleh,et al.  DAR: A Digital Assets Repository for Library Collections , 2005, ECDL.

[15]  Sergei Nirenburg,et al.  Readings in Machine Translation , 2003 .