Статистическая модель для распознавания смыслов в текстах иностранного языка с обучением на примерах из параллельных текстов (Statistical Model for Recognition of Senses in Foreign Language Texts Trained by Examples from Parallel Texts)

Recognition of senses (mentioning of target situations, events and facts) in foreign language texts needs developing of a syntactic analyzer and some linguistic components for this language. The alternative approach to construct a senses recognizer that does not need complex machine analysis of the language of a text is proposed in the report. This approach builds a statistical model of a senses recognizer in a form of n-tuples of words that stand together in the text, permitting insertion of a few other words between them. To train the model, a corpus of parallel texts and a Russian linguistic analyzer are applied. The linguistic analyzer is used to extract target senses from Russian texts, selecting the fragments that are relevant to these senses in parallel texts in a foreign language. The results of experiments in senses recognition in the corpus of quasi-parallel Russian-Armenian news texts are described, as well as a preliminary procedure of parallel text fragments alignment.

[1]  Daqing He,et al.  Cross-Language Information Retrieval , 2009, Information Retrieval.

[2]  Jianfeng Gao,et al.  Translingual Mining from Text Data , 2012, Mining Text Data.

[3]  Gregory Grefenstette,et al.  Cross-Language Information Retrieval , 1998, The Springer International Series on Information Retrieval.