Some of my Best Friends are Linguists

This article concerns the relationship between linguistics and the work carried out during 1972–1993 at IBMResearch in automatic speech recognition (ASR) and natural language processing (NLP). Many statements I will make will be incomplete: I am not that conversant with the literature. I apologize to those whomImayoffend.Conceivably itwouldhavebeenmuchbetter to leave things alone, stay silent. Hopefully this journal will be willing to devote some of its pages to Letters to the Editor to correct the record or air opposing views. The starting point is the following quote attributed to me: Whenever I fire a linguist our system performance improves. I have hoped for many years that this quote was only apocryphal, but at least two reliable witnesses have recently convinced me that I really stated this publicly in a conference talk (Jelinek, 1998). Accepting then that I really said it, I must first of all affirm that I never fired anyone, and a linguist least of all. So my motivation is defensive: to show that neither I nor my colleagues at IBM ever had any hostility to linguists or linguistics. In fact, we all hoped that linguists would provide us with needed help. We were never reluctant to include linguistic knowledge or intuition into our systems: if we didn’t succeed, it was because we didn’t find an efficient way to do include it.

[1]  Ralph Grishman,et al.  A Procedure for Quantitatively Comparing the Syntactic Coverage of English Grammars , 1991, HLT.

[2]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[3]  J. Baker,et al.  The DRAGON system--An overview , 1975 .

[4]  Johansson. Stig,et al.  Manual of information to accompany the Lancaster-Oslo : Bergen Corpus of British English, for use with digital computers , 1978 .

[5]  A. Samuel,et al.  Whither speech recognition? , 1969, The Journal of the Acoustical Society of America.

[6]  Geoffrey Leech,et al.  Running a grammar factory: The production of syntactically analysed corpora or “treebanks” , 1991 .

[7]  John Cocke,et al.  A Statistical Approach to Machine Translation , 1990, CL.

[8]  Robert L. Mercer,et al.  A Statistical Approach to Sense Disambiguation in Machine Translation , 1991, HLT.

[9]  F. Jelinek,et al.  Perplexity—a measure of the difficulty of speech recognition tasks , 1977 .

[10]  Michael Collins,et al.  A New Statistical Parser Based on Bigram Lexical Dependencies , 1996, ACL.

[11]  R. Burchfield Frequency Analysis of English Usage: Lexicon and Grammar. By W. Nelson Francis and Henry Kučera with the assistance of Andrew W. Mackie. Boston: Houghton Mifflin. 1982. x + 561 , 1985 .

[12]  Eric Brill,et al.  A Simple Rule-Based Part of Speech Tagger , 1992, HLT.

[13]  W. N. Locke,et al.  Machine Translation of Languages , 1956 .

[14]  Roger Garside The Large-Scale Production of Syntactically Analysed Corpora , 1993 .

[15]  Frederick Jelinek,et al.  Towards history-based grammars: using richer models for probabilistic parsing , 1992 .

[16]  Jan Svartvik,et al.  A __ comprehensive grammar of the English language , 1988 .