Book Reviews: Foundations of Statistical Natural Language Processing

Statistical approaches to processing natural language text have become dominant in recent years. This foundational text is the first comprehensive introduction to statistical natural language processing (NLP) to appear. The book contains all the theory and algorithms needed for building NLP tools. It provides broad but rigorous coverage of mathematical and linguistic foundations, as well as detailed discussion of statistical methods, allowing students and researchers to construct their own implementations. The book covers collocation finding, word sense disambiguation, probabilistic parsing, information retrieval, and other applications.

[1]  R. Darnell Translation , 1873, The Indian medical gazette.

[2]  J. Wolfowitz,et al.  An Introduction to the Theory of Statistics , 1951, Nature.

[3]  G. Zipf,et al.  Relative Frequency as a Determinant of Phonetic Change , 1930 .

[4]  G. Āllport The Psycho-Biology of Language. , 1936 .

[5]  Noah Webster,et al.  Webster's new collegiate dictionary , 1936 .

[6]  George Kingsley Zipf,et al.  Human behavior and the principle of least effort , 1949 .

[7]  Edward Sapir,et al.  Language: An Introduction to the Study of Speech , 1955 .

[8]  Claude E. Shannon,et al.  Prediction and Entropy of Printed English , 1951 .

[9]  S. Siegel,et al.  Nonparametric Statistics for the Behavioral Sciences , 2022, The SAGE Encyclopedia of Research Design.

[10]  Gerard Salton,et al.  An approach to the segmentation problem in speech analysis and language translation , 1961, EARLYMT.

[11]  A. A. Mullin,et al.  Principles of neurodynamics , 1962 .

[12]  Frank Rosenblatt,et al.  PRINCIPLES OF NEURODYNAMICS. PERCEPTRONS AND THE THEORY OF BRAIN MECHANISMS , 1963 .

[13]  Percy H. Tannenbaum,et al.  Stochastic approach to the grammatical coding of english , 1965, CACM.

[14]  Frederick B. Thompson,et al.  English for the computer , 1899, AFIPS '66 (Fall).

[15]  Andrew J. Viterbi,et al.  Error bounds for convolutional codes and an asymptotically optimum decoding algorithm , 1967, IEEE Trans. Inf. Theory.

[16]  Daniel H. Younger,et al.  Recognition and Parsing of Context-Free Languages in Time n^3 , 1967, Inf. Control..

[17]  Marvin Minsky,et al.  Perceptrons: An Introduction to Computational Geometry , 1969 .

[18]  D. Sankoff Branching processes with terminal types: application to context-free grammars , 1971, Journal of Applied Probability.

[19]  J. J. Rocchio,et al.  Relevance feedback in information retrieval , 1971 .

[20]  Gerard Salton,et al.  The SMART Retrieval System—Experiments in Automatic Document Processing , 1971 .

[21]  Gerard Salton,et al.  Experiments in Automatic Thesaurus Construction for Information Retrieval , 1971, IFIP Congress.

[22]  Stephen E. Robertson,et al.  Relevance weighting of search terms , 1976, J. Am. Soc. Inf. Sci..

[23]  Peter Mark Roget,et al.  Roget's International Thesaurus , 1977 .

[24]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[25]  Makoto Nagao,et al.  A framework of a mechanical translation between Japanese and English by analogy principle , 1984 .

[26]  David Marr,et al.  VISION A Computational Investigation into the Human Representation and Processing of Visual Information , 2009 .

[27]  Jan Svartvik,et al.  A __ comprehensive grammar of the English language , 1988 .

[28]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[29]  G. Salton,et al.  Extended Boolean information retrieval , 1983, CACM.

[30]  James Jaccard,et al.  Statistics for the Behavioral Sciences , 1983 .

[31]  H. Barlow Vision: A computational investigation into the human representation and processing of visual information: David Marr. San Francisco: W. H. Freeman, 1982. pp. xvi + 397 , 1983 .

[32]  Frederick Mosteller,et al.  Applied Bayesian and classical inference : the case of the Federalist papers , 1984 .

[33]  Donald E. Walker,et al.  Knowledge Resource Tools for Accessing Large Text Files , 1985, TMI.

[34]  David Zipser,et al.  Feature Discovery by Competive Learning , 1986, Cogn. Sci..

[35]  David L. Waltz,et al.  Toward memory-based reasoning , 1986, CACM.

[36]  James L. McClelland,et al.  On learning the past-tenses of English verbs: implicit rules or parallel distributed processing , 1986 .

[37]  P. Willett,et al.  A Comparison of Some Measures for the Determination of Inter‐Molecular Structural Similarity Measures of Inter‐Molecular Structural Similarity , 1986 .

[38]  E. Williams,et al.  Introduction to the Theory of Grammar , 1986 .

[39]  Igor Mel’čuk,et al.  Dependency Syntax: Theory and Practice , 1987 .

[40]  Kenneth Ward Church,et al.  PRELIMINARY ANALYSIS OF A BREADTH-FIRST PARSING ALGORITHM: THEORETICAL AND EXPERIMENTAL RESULTS , 1987 .

[41]  Peter Willett,et al.  Recent trends in hierarchic document clustering: A critical review , 1988, Inf. Process. Manag..

[42]  Gerard Salton,et al.  Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer , 1989 .

[43]  Mitchell P. Marcus,et al.  Automatic Acquisition of the Lexical Semantics of Verbs from Sentence Frames , 1989, ACL.

[44]  Eitan M. Gurari,et al.  Introduction to the theory of computation , 1989 .

[45]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[46]  Paola Velardi,et al.  Computer Aided Interpretation of Lexical Coocurrences , 1989, ACL.

[47]  Geoffrey Sampson,et al.  How Fully Does a Machine-Usable Dictionary Cover English Text? , 1989 .

[48]  F. Newmeyer Linguistics: The Cambridge Survey , 1989 .

[49]  Alex Waibel,et al.  Readings in speech recognition , 1990 .

[50]  Kathleen McKeown,et al.  Automatically Extracting and Representing Collocations for Language Generation , 1990, ACL.

[51]  Douglas B. Paul,et al.  Speech Recognition Using Hidden Markov Models , 1990 .

[52]  Beatrice Santorini Part-of-speech tagging guidelines for the penn treebank project , 1990 .

[53]  H. J. Arnold Introduction to the Practice of Statistics , 1990 .

[54]  Ian H. Witten,et al.  The zero-frequency problem: Estimating the probabilities of novel events in adaptive text compression , 1991, IEEE Trans. Inf. Theory.

[55]  B. T. S. Atkins,et al.  Predictable Meaning Shift: Some Linguistic Properties of Lexical Implication Rules , 1991, SIGLEX Workshop.

[56]  G Salton,et al.  Global Text Matching for Information Retrieval , 1991, Science.

[57]  G. Miller,et al.  Contextual correlates of semantic similarity , 1991 .

[58]  Graeme Hirst,et al.  Lexical Cohesion Computed by Thesaural relations as an indicator of the structure of text , 1991, CL.

[59]  G. Youmans A New Tool for Discourse Analysis: The Vocabulary-Management Profile. , 1991 .

[60]  Mitchell P. Marcus,et al.  Pearl: A Probabilistic Chart Parser , 1991, EACL.

[61]  Richard K. Belew,et al.  Exporting phrases: a statistical analysis of topical language , 1991 .

[62]  Philip Resnik,et al.  Probabilistic Tree-Adjoining Grammar as a Framework for Statistical Natural Language Processing , 1992, COLING.

[63]  Fernando Pereira,et al.  Inside-Outside Reestimation From Partially Bracketed Corpora , 1992, HLT.

[64]  David L. Waltz,et al.  Classifying news stories using memory based reasoning , 1992, SIGIR '92.

[65]  David Yarowsky,et al.  Word-Sense Disambiguation Using Statistical Models of Roget’s Categories Trained on Large Corpora , 2010, COLING.

[66]  Michel Simard,et al.  Using cognates to align sentences in bilingual corpora , 1993, TMI.

[67]  Alan F. Smeaton,et al.  The Application of Morpho-Syntactic Language Processing to Effective Phrase Matching , 1992, Inf. Process. Manag..

[68]  Alan F. Smeaton,et al.  Progress in the Application of Natural Language Processing to Information Retrieval Tasks , 1992, Comput. J..

[69]  Jean Tague-Sutcliffe,et al.  The Pragmatics of Information Retrieval Experimentation Revisited , 1997, Inf. Process. Manag..

[70]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[71]  Andreas Stolcke,et al.  Hidden Markov Model} Induction by Bayesian Model Merging , 1992, NIPS.

[72]  Satoshi Sato,et al.  CTM: An Example-Based Translation Aid System , 1992, COLING.

[73]  Ronald Rosenfeld,et al.  Improvements in Stochastic Language Modeling , 1992, HLT.

[74]  Charles T. Meadow,et al.  Text information retrieval systems , 1992 .

[75]  Yiyu Yao,et al.  An Information-Theoretic Measure of Term Specificity , 1992, J. Am. Soc. Inf. Sci..

[76]  Geoffrey Nunberg,et al.  Systematic polysemy in lexicology and lexicography , 1992 .

[77]  Yves Schabes,et al.  Stochastic Lexicalized Tree-adjoining Grammars , 1992, COLING.

[78]  H. Schütze,et al.  Dimensions of meaning , 1992, Supercomputing '92.

[79]  Richard Sproat,et al.  Morphology and computation , 1992 .

[80]  David M. Magerman,et al.  Efficiency, Robustness and Accuracy in Picky Chart Parsing , 1992, ACL.

[81]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[82]  James Allan,et al.  Selective text utilization and text traversal , 1993, Int. J. Hum. Comput. Stud..

[83]  V. Hansen Geometry in Nature , 1993 .

[84]  Aaron D. Wyner,et al.  Prediction and Entropy of Printed English , 1993 .

[85]  Christer Samuelsson,et al.  Morphological Tagging Based Entirely on Bayesian Inference , 1993, NODALIDA.

[86]  Eric Saund,et al.  Unsupervised Learning of Mixtures of Multiple Causes in Binary Data , 1993, NIPS.

[87]  David D. McDonald Internal and External Evidence in the Identification and Semantic Categorization of Proper Names , 1993 .

[88]  Naftali Tishby,et al.  Distributional Clustering of English Words , 1993, ACL.

[89]  Richard M. Schwartz,et al.  Coping with Ambiguity and Unknown Words through Probabilistic Models , 1993, CL.

[90]  Philip Resnik,et al.  Structural Ambiguity and Conceptual Relations , 1993, VLC@ACL.

[91]  Hinrich Schütze,et al.  Part-of-Speech Induction From Scratch , 1993, ACL.

[92]  R. L. Trask A Dictionary of Grammatical Terms in Linguistics , 1993 .

[93]  James Pustejovsky,et al.  Lexical Semantic Techniques for Corpus Analysis , 1993, CL.

[94]  Antonio Sanfilippo,et al.  Detecting Dependencies between Semantic Verb Subclasses and Subcategorization Frames in Text Corpora , 1996 .

[95]  SmadjaFrank Retrieving collocations from text , 1993 .

[96]  Hadar Shemtov Text Alignment in a Tool for Translating Revised Documents , 1993, EACL.

[97]  Frank Smadja,et al.  Retrieving Collocations from Text: Xtract , 1993, CL.

[98]  Eugene Charniak,et al.  Statistical language learning , 1997 .

[99]  Hermann Ney,et al.  Estimating 'small' probabilities by leaving-one-out , 1993, EUROSPEECH.

[100]  P. Resnik Selection and information: a class-based approach to lexical relationships , 1993 .

[101]  Yves Schabes,et al.  Parsing the Wall Street Journal with the Inside-Outside Algorithm , 1993, EACL.

[102]  Adwait Ratnaparkhi,et al.  A Maximum Entropy Model for Prepositional Phrase Attachment , 1994, HLT.

[103]  Ronald Rosenfeld,et al.  Adaptive Statistical Language Modeling; A Maximum Entropy Approach , 1994 .

[104]  Andreas Stolcke,et al.  Inducing Probabilistic Grammars by Bayesian Model Merging , 1994, ICGI.

[105]  D. Haussler,et al.  Stochastic context-free grammars for modeling RNA , 1993, 1994 Proceedings of the Twenty-Seventh Hawaii International Conference on System Sciences.

[106]  R. C. Underwood,et al.  Stochastic context-free grammars for tRNA modeling. , 1994, Nucleic acids research.

[107]  Andreas Stolcke,et al.  Best-first Model Merging for Hidden Markov Model Induction , 1994, ArXiv.

[108]  Marti A. Hearst,et al.  Adaptive Sentence Boundary Disambiguation , 1994, ANLP.

[109]  Mitchell P. Marcus,et al.  Exploring the Statistical Derivation of Transformational Rule Sequences for Part-of-Speech Tagging , 1994, ArXiv.

[110]  Maryellen C. MacDonald,et al.  The lexical nature of syntactic ambiguity resolution , 1994 .

[111]  Ivan A. Sag,et al.  Book Reviews: Head-driven Phrase Structure Grammar and German in Head-driven Phrase-structure Grammar , 1996, CL.

[112]  Bernard Mérialdo,et al.  Tagging English Text with a Probabilistic Model , 1994, CL.

[113]  C. M. Sperberg-McQueen,et al.  Guidelines for electronic text encoding and interchange , 1994 .

[114]  Ann Bies,et al.  The Penn Treebank: Annotating Predicate Argument Structure , 1994, HLT.

[115]  Khalil Sima'an,et al.  Efficient Disambiguation by means of Stochastic Tree Substitution Grammars , 1994 .

[116]  David Yarowsky,et al.  DECISION LISTS FOR LEXICAL AMBIGUITY RESOLUTION: Application to Accent Restoration in Spanish and French , 1994, ACL.

[117]  Chilin Shih,et al.  A Stochastic Finite-State Word-Segmentation Algorithm for Chinese , 1994, ACL.

[118]  Hinrich Schütze,et al.  Part-of-Speech Tagging Using a Variable Memory Markov Model , 1994, ACL.

[119]  David M. Magerman Natural Language Parsing as Statistical Pattern Recognition , 1994, ArXiv.

[120]  Dekai Wu,et al.  Aligning a Parallel English-Chinese Corpus Statistically With Lexical Criteria , 1994, ACL.

[121]  Yiming Yang,et al.  Expert network: effective and efficient learning from human decisions in text categorization and retrieval , 1994, SIGIR '94.

[122]  Helmut Schmidt,et al.  Probabilistic part-of-speech tagging using decision trees , 1994 .

[123]  G Salton,et al.  Automatic Analysis, Theme Generation, and Summarization of Machine-Readable Texts , 1994, Science.

[124]  Gerard Salton,et al.  Length Normalization in Degraded Text Collections , 1995 .

[125]  Jan O. Pedersen Information Retrieval Based on Word Senses , 1995 .

[126]  Yves Schabes,et al.  Deterministic Part-of-Speech Tagging with Finite-State Transducers , 1995, Comput. Linguistics.

[127]  Andreas Stolcke,et al.  An Efficient Probabilistic Context-Free Parsing Algorithm that Computes Prefix Probabilities , 1994, CL.

[128]  Hinrich Schütze,et al.  A comparison of classifiers and document representations for the routing problem , 1995, SIGIR '95.

[129]  David Yarowsky,et al.  Unsupervised Word Sense Disambiguation Rivaling Supervised Methods , 1995, ACL.

[130]  Hinrich Schütze,et al.  Distributional Part-of-Speech Tagging , 1995, EACL.

[131]  Andreas S. Weigend,et al.  A neural network approach to topic spotting , 1995 .

[132]  J. Sinclair Collins Cobuild English dictionary , 1995 .

[133]  Atro Voutilainen A syntax-based part-of-speech analyser , 1995, EACL.

[134]  Yiming Yang,et al.  Noise reduction in a statistical approach to text categorization , 1995, SIGIR '95.

[135]  David M. Magerman Statistical Decision-Tree Models for Parsing , 1995, ACL.

[136]  Vasileios Hatzivassiloglou,et al.  Translating Collocations for Bilingual Lexicons: A Statistical Approach , 1996, CL.

[137]  Ronald Rosenfeld,et al.  A maximum entropy approach to adaptive statistical language modelling , 1996, Comput. Speech Lang..

[138]  J. Siskind A computational study of cross-situational techniques for learning word-to-meaning mappings , 1996, Cognition.

[139]  Dekai Wu,et al.  A Polynomial-Time Algorithm for Statistical Machine Translation , 1996, ACL.

[140]  Hermann Ney,et al.  HMM-Based Word Alignment in Statistical Translation , 1996, COLING.

[141]  Eric Sven Ristad,et al.  Maximum Entropy Modeling Toolkit , 1996, ArXiv.

[142]  Scott A. Waterman,et al.  Distinguished usage , 1996 .

[143]  J. Ross Quinlan,et al.  Bagging, Boosting, and C4.5 , 1996, AAAI/IAAI, Vol. 1.

[144]  P. Resnik Selectional constraints: an information-theoretic model and its computational realization , 1996, Cognition.

[145]  Elizabeth D. Liddy,et al.  Categorization and Standardizing Proper Nouns for Efficient Information Retrieval , 1996 .

[146]  Sergei Nirenburg,et al.  From Submit to Submitted via Submission: On Lexical Rules in Large-Scale Lexicon Acquisition , 1996, ACL.

[147]  M. Stubbs Text and corpus analysis , 1996 .

[148]  Christer Samuelsson,et al.  Handling Sparse Data by Successive Abstraction , 1996, COLING.

[149]  Michel Simard,et al.  Bilingual Sentence Alignment: Balancing Robustness and Accuracy , 2004, Machine Translation.

[150]  Raymond J. Mooney,et al.  Comparative Experiments on Disambiguating Word Senses: An Illustration of the Role of Bias in Machine Learning , 1996, EMNLP.

[151]  Adwait Ratnaparkhi,et al.  A Maximum Entropy Model for Part-Of-Speech Tagging , 1996, EMNLP.

[152]  Francis Jack Smith,et al.  Improving Statistical Language Model Performance with Automatically Generated Word Hierarchies , 1995, Comput. Linguistics.

[153]  M. Stubbs Text and Corpus Analysis: Computer-Assisted Studies of Language and Culture , 1996 .

[154]  Patrick Suppes,et al.  Machine Learning Comprehension Grammars for Ten Languages , 1996, Comput. Linguistics.

[155]  W. Hersh Information Retrieval: A Health Care Perspective , 1995, Computers and Medicine.

[156]  Richard M. Schwartz,et al.  A Fully Statistical Approach to Natural Language Interfaces , 1996, ACL.

[157]  Khalil Simaan,et al.  Computational Complexity of Probabilistic Disambiguation by means of Tree-Grammars , 1996, COLING.

[158]  Ted Pedersen,et al.  Fishing for Exactness , 1996, ArXiv.

[159]  Hwee Tou Ng,et al.  Integrating Multiple Knowledge Sources to Disambiguate Word Sense: An Exemplar-Based Approach , 1996, ACL.

[160]  Alexander H. Waibel,et al.  Decoding Algorithm in Statistical Machine Translation , 1997, ACL.

[161]  Johanna D. Moore,et al.  Empirical Studies in Discourse , 1997, CL.

[162]  Walter Daelemans,et al.  Resolving PP attachment Ambiguities with Memory-Based Learning , 1997, CoNLL.

[163]  Frederick Jelinek,et al.  Statistical methods for speech recognition , 1997 .

[164]  Hinrich Schütze,et al.  A Cooccurrence-Based Thesaurus and Two Applications to Information Retrieval , 1994, Inf. Process. Manag..

[165]  Claire Cardie,et al.  An Analysis of Statistical and Syntactic Phrases , 1997, RIAO.

[166]  Sven C. Martin,et al.  Statistical Language Modeling Using Leaving-One-Out , 1997 .

[167]  Marti A. Hearst,et al.  Adaptive Multilingual Sentence Boundary Disambiguation , 1997, CL.

[168]  Walter Daelemans,et al.  Memory-Based Learning: Using Similarity for Smoothing , 1997, ACL.

[169]  Adwait Ratnaparkhi,et al.  A Linear Observed Time Statistical Parser Based on Maximum Entropy Models , 1997, EMNLP.

[170]  Tony C. Smith,et al.  Probabilistic Unification Grammars , 1997 .

[171]  Eric Sven Ristad,et al.  Hierarchical Non-Emitting Markov Models , 1997, ACL.

[172]  Adwait Ratnaparkhi,et al.  A Maximum Entropy Approach to Identifying Sentence Boundaries , 1997, ANLP.

[173]  Bob Carpenter,et al.  Probabilistic Parsing using Left Corner Language Models , 1997, IWPT.

[174]  Hinrich Schütze,et al.  Ambiguity resolution in language learning , 1997 .

[175]  Hwee Tou Ng,et al.  Corpus-Based Approaches to Semantic Interpretation in Natural Language Processing , 1997 .

[176]  Jan O. Pedersen,et al.  Almost-constant-time clustering of arbitrary corpus subsets4 , 1997, SIGIR '97.

[177]  Ian H. Witten,et al.  Browsing in digital libraries: a phrase-based approach , 1997, DL '97.

[178]  Atro Voutilainen,et al.  Comparing a Linguistic and a Stochastic Tagger , 1997, ACL.

[179]  Sayori Shimohata,et al.  Retrieving Collocations by Co-Occurrences and Word Order Constraints , 1997, ACL.

[180]  Ellen Riloff,et al.  A Corpus-Based Approach for Building Semantic Lexicons , 1997, EMNLP.

[181]  I. Dan Melamed A Word-to-Word Model of Translational Equivalence , 1997, ACL.

[182]  I. Dan Melamed A Portable Algorithm for Mapping Bitext Correspondence , 1997, ACL.

[183]  Peter Schäuble,et al.  Cross-language speech retrieval: establishing a baseline performance , 1997, SIGIR '97.

[184]  Adwait Ratnaparkhi,et al.  A Simple Introduction to Maximum Entropy Models for Natural Language Processing , 1997 .

[185]  Emmanuel Roche,et al.  Finite-State Language Processing , 1997 .

[186]  Ted Pedersen,et al.  Distinguishing Word Senses in Untagged Text , 1997, EMNLP.

[187]  Ken Samuel,et al.  Dialogue Act Tagging with Transformation-Based Learning , 1998, ACL.

[188]  Wojciech Skut,et al.  A Maximum-Entropy Partial Parser for Unrestricted Text , 1998, VLC@COLING/ACL.

[189]  Jason Catlett,et al.  Making sense out of searching , 1998 .

[190]  Hinrich Schütze,et al.  Automatic Word Sense Discrimination , 1998, Comput. Linguistics.

[191]  Dan Roth,et al.  Learning to Resolve Natural Language Ambiguities: A Unified Approach , 1998, AAAI/IAAI.

[192]  Dan Jurafsky,et al.  Dialog Act Modeling for Conversational Speech , 1998 .

[193]  N. Chater,et al.  Rational models of cognition , 1998 .

[194]  Alistair Moffat,et al.  Exploring the similarity space , 1998, SIGF.

[195]  Sean McGrath PARSEME.1st : SGML for software developers , 1998 .

[196]  Frederick Jelinek,et al.  Exploiting Syntactic Structure for Language Modeling , 1998, ACL.

[197]  Marilyn A. Walker,et al.  Learning Optimal Dialogue Strategies: A Case Study of a Spoken Dialogue Agent for Email , 1998, COLING-ACL.

[198]  Carson T. Schütze The empirical base of linguistics: Grammaticality judgments and linguistic methodology , 1998 .

[199]  Ellen M. Voorhees,et al.  Disambiguating Highly Ambiguous Words , 1998, CL.

[200]  Dekai Wu,et al.  Machine Translation with a Stochastic Grammatical Channel , 1998, COLING-ACL.

[201]  Simon St. Laurent,et al.  XML: A Primer , 1998 .

[202]  Hans van Halteren,et al.  Improving Data Driven Wordclass Tagging by System Combination , 1998, ACL.

[203]  Alexander H. Waibel,et al.  Modeling with Structures in Statistical Machine translation , 1998, ACL.

[204]  Andrei Mikheev Feature Lattices for Maximum Entropy Modelling , 1998, COLING-ACL.

[205]  Eric Sven Ristad,et al.  A natural law of succession , 1995, Proceedings. 1998 IEEE International Symposium on Information Theory (Cat. No.98CH36252).

[206]  Hermann Ney,et al.  A DP based Search Algorithm for Statistical Machine Translation , 1998, COLING-ACL.

[207]  Yoram Singer,et al.  Boosting and Rocchio applied to text filtering , 1998, SIGIR '98.

[208]  J Allan,et al.  Readings in information retrieval. , 1998 .

[209]  S. Bernstein,et al.  Introduction , 1999, Brain and Language.

[210]  Brian Roark,et al.  Noun-phrase co-occurrence statistics for semi-automatic semantic lexicon construction , 2000, COLING.

[211]  James H. Martin,et al.  Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition, 2nd Edition , 2000, Prentice Hall series in artificial intelligence.

[212]  Hinrich Schfitze Context Space , 2001 .

[213]  Kenneth Ward Church,et al.  Using Suffix Arrays to Compute Term Frequency and Document Frequency for All Substrings in a Corpus , 2001, Computational Linguistics.

[214]  Mary S. Neff,et al.  Get It Where You Can : Acquiring and Maintaining Bilingual Lexicons for Machine Translation , 2002 .

[215]  Philip Resnik,et al.  A Perspective on Word Sense Disambiguation Methods and Their Evaluation , 2002 .

[216]  William H. Press,et al.  Numerical recipes in C , 2002 .

[217]  F. Ramsey,et al.  The statistical sleuth : a course in methods of data analysis , 2002 .

[218]  E. Dura Natural Language in Information Retrieval , 2003, CICLing.

[219]  Karen Spärck Jones A statistical interpretation of term specificity and its application in retrieval , 2021, J. Documentation.

[220]  Yiming Yang,et al.  An Evaluation of Statistical Approaches to Text Categorization , 1999, Information Retrieval.

[221]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[222]  P. Suppes Probabilistic grammars for natural languages , 1970, Synthese.

[223]  R. Lee Humphreys,et al.  The linguistics of punctuation , 2004, Machine Translation.