论文信息 - Mining the Web for Bilingual Text

Mining the Web for Bilingual Text

STRAND (Resnik, 1998) is a language-independent system for automatic discovery of text in parallel translation on the World Wide Web. This paper extends the preliminary STRAND results by adding automatic language identification, scaling up by orders of magnitude, and formally evaluating performance. The most recent end-product is an automatically acquired parallel corpus comprising 2491 English-French document pairs, approximately 1.5 million words per language.

Philip Resnik | P. Resnik

[1] Mark W. Davis,et al. A TREC Evaluation of Query Translation Methods For Multi-Lingual Text Retrieval , 1995, TREC.

[2] John Cocke,et al. A Statistical Approach to Machine Translation , 1990, CL.

[3] Jean Carletta,et al. Assessing Agreement on Classification Tasks: The Kappa Statistic , 1996, CL.

[4] Ellen M. Voorhees,et al. The Text REtrieval Conference (TREC-2001) (10th, Gaithersburg, Maryland, November 13-16, 2001). NIST Special Publication. , 2000 .

[5] A. Dobson,et al. Assessing agreement , 1989, The Medical journal of Australia.

[6] Douglas W. Oard. Cross-Language Text Retrieval Research in the USA , 1997 .

[7] Philip Resnik,et al. Parallel strands: a preliminary investigation into mining the Web for bilingual text , 1998, AMTA.

[8] Ellen M. Voorhees,et al. The Seventh Text REtrieval Conference (TREC-7) | NIST , 1999 .