Constructing Multilingual Preterminological Graphs using various online-community resources

We are describe the concept of dedicated Multilingual Preterminological Graphs MPGs, and some automatic approaches for constructing them by analyzing the behavior of online community users. A Multilingual Preterminological Graph is a special lexical resource that contains massive amount of terms related to a special domain, and can be used as raw material to later build a standardized terminological repository. Building such a graph is difficult using traditional approaches, as it needs huge efforts by domain specialists and terminologists. In our approach, we build such a graph by analyzing the access log files of the website of the community, and by finding the important terms that have been used to search in that website, and their association with each other. We aim at making this graph as a seed repository so multilingual volunteers can contribute. We are experimenting this approach with the Digital Silk Road Project. We have used its access log files since its beginning in 2003, and obtained an initial graph of around 116000 terms. As an application, we used this graph to obtain a preterminological multilingual database that is serving a CLIR system for the DSR project.

[1]  Luis von Ahn,et al.  Human computation , 2009, 2009 46th ACM/IEEE Design Automation Conference.

[2]  Juan C. Sager,et al.  Terminology: Theory, methods and applications , 1999 .

[3]  Kyo Kageura,et al.  The Dynamics of Terminology: A descriptive theory of term formation and terminological growth , 2002 .

[4]  B. Achiriloaie,et al.  VI REFERENCES , 1961 .

[5]  Yorick Wilks,et al.  The Use of Machine Readable Dictionaries in the Pangloss Project , 1993 .

[6]  Christian Boitet,et al.  CLIR-Based Collaborative Construction of Multilingual Terminological Dictionary for Cultural Resources , 2008 .

[7]  Oren Etzioni,et al.  Lexical Translation with Application to Image Search on the Web , 2007 .

[8]  Aitao Chen,et al.  Cross-language Retrieval Experiments at CLEF 2002 , 2002, CLEF.

[9]  Bhuvana Ramabhadran,et al.  Automatic recognition of spontaneous speech for access to multilingual oral history archives , 2004, IEEE Transactions on Speech and Audio Processing.

[10]  Piek T. J. M. Vossen,et al.  Acquisition of lexical translation relations from MRDS , 2004, Machine Translation.

[11]  影浦 峡 The dynamics of terminology : a descriptive theory of term formation and terminological growth , 2002 .

[12]  Mark Strembeck,et al.  A User Profile Derivation Approach based on Log-File Analysis , 2007, IKE.

[13]  Ying Zhang,et al.  Domain-Specific Query Translation for Multilingual Information Access using Machine Translation Augmented With Dictionaries Mined from Wikipedia , 2008, IJCNLP.