The paper argues that the IBM statistical approach to machine translation has done rather better after a few years than many sceptics believed it could. However, it is neither as novel as its proponents suggest nor is it making claims as clear and simple as they would have us believe. The performance of the purely statistical system (and we discuss what that phrase could mean) has not equalled the performance of SYSTRAN. More importantly, the system is now being shifted to a hybrid that incorporates much of the linguistic information that it was initially claimed by IBM would not be needed for MT. Hence, one might infer that its own proponents do not believe ‘pure’ statistics sufficient for MT of a usable quality. In addition to real limits on the statistical method, there are also strong economic limits imposed by their methodology of data gathering. However, the paper concludes that the IBM group have done the field a great service in pushing these methods far further than before, and by reminding everyone of the virtues of empiricism in the field and the need for large scale gathering of data.
[1]
Robert L. Mercer,et al.
Aligning Sentences in Parallel Corpora
,
1991,
ACL.
[2]
Yehoshua Bar-Hillel,et al.
The Present Status of Automatic Translation of Languages
,
1960,
Adv. Comput..
[3]
Gilbert W. King,et al.
Stochastic Methods of Mechanical Translation
,
1956,
EARLYMT.
[4]
Kenneth Ward Church,et al.
Poor Estimates of Context are Worse than None
,
1990,
HLT.
[5]
Frederick Jelinek,et al.
Interpolated estimation of Markov source parameters from sparse data
,
1980
.
[6]
John Cocke,et al.
A Statistical Approach to Machine Translation
,
1990,
CL.
[7]
Beth Sundheim,et al.
A Performance Evaluation of Text-Analysis Technologies
,
1991,
AI Mag..
[8]
Michael C. McCord,et al.
A New Version of the Machine Translation System LMT
,
1989
.
[9]
Yorick Wilks,et al.
A Preferential, Pattern-Seeking, Semantics for Natural Language Inference
,
1975,
Artif. Intell..