Large-scale semi-supervised learning for natural language processing

Natural Language Processing (NLP) develops computational approaches to processing language data. Supervised machine learning has become the dominant methodology of modern NLP. The performance of a supervised NLP system crucially depends on the amount of data available for training. In the standard supervised framework, if a sequence of words was not encountered in the training set, the system can only guess at its label at test time. The cost of producing labeled training examples is a bottleneck for current NLP technology. On the other hand, a vast quantity of unlabeled data is freely available. This dissertation proposes effective, efficient, versatile methodologies for 1) extracting useful information from very large (potentially web-scale) volumes of unlabeled data and 2) combining such information with standard supervised machine learning for NLP. We demonstrate novel ways to exploit unlabeled data, we scale these approaches to make use of all the text on the web, and we show improvements on a variety of challenging NLP tasks. This combination of learning from both labeled and unlabeled data is often referred to as semi-supervised learning. Although lacking manually-provided labels, the statistics of unlabeled patterns can often distinguish the correct label for an ambiguous test instance. In the first part of this dissertation, we propose to use the counts of unlabeled patterns as features in supervised classifiers, with these classifiers trained on varying amounts of labeled data. We propose a general approach for integrating information from multiple, overlapping sequences of context for lexical disambiguation problems. We also show how standard machine learning algorithms can be modified to incorporate a particular kind of prior knowledge: knowledge of effective weightings for count-based features. We also evaluate performance within and across domains for two generation and two analysis tasks, assessing the impact of combining web-scale counts with conventional features. In the second part of this dissertation, rather than using the aggregate statistics as features, we propose to use them to generate labeled training examples. By automatically labeling a large number of examples, we can train powerful discriminative models, leveraging fine-grained features of input words.

[1]  Colin Cherry,et al.  Fast and Accurate Arc Filtering for Dependency Parsing , 2010, COLING.

[2]  Antal van den Bosch All-word Prediction as the Ultimate Confusible Disambiguation , 2006, Workshop On Computationally Hard Problems And Joint Inference In Speech And Language Processing.

[3]  James Shaw,et al.  Ordering Among Premodifiers , 1999, ACL.

[4]  Kenneth Ward Church,et al.  Using Web-scale N-grams to Improve Base NP Parsing Performance , 2010, COLING.

[5]  Stephen Clark,et al.  Class-Based Probability Estimation Using a Semantic Hierarchy , 2002, CL.

[6]  Michael Collins,et al.  Discriminative Reranking for Natural Language Parsing , 2000, CL.

[7]  Grzegorz Kondrak,et al.  Applying Many-to-Many Alignments and Hidden Markov Models to Letter-to-Phoneme Conversion , 2007, NAACL.

[8]  Yoram Singer,et al.  Unsupervised Models for Named Entity Classification , 1999, EMNLP.

[9]  HE Ixtroductiont,et al.  The Bell System Technical Journal , 2022 .

[10]  Ari Rappoport,et al.  Induction of cross-language affix and letter sequence correspondence , 2006 .

[11]  Katrin Erk,et al.  A Simple, Similarity-based Model for Selectional Preferences , 2007, ACL.

[12]  Philip Resnik,et al.  Mining the Web for Bilingual Text , 1999, ACL.

[13]  Stefan Evert,et al.  Significance tests for the evaluation of ranking methods , 2004, COLING.

[14]  R. Lathe Phd by thesis , 1988, Nature.

[15]  Mikhail Belkin,et al.  Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples , 2006, J. Mach. Learn. Res..

[16]  Kenneth Ward Church,et al.  Word Association Norms, Mutual Information, and Lexicography , 1989, ACL.

[17]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[18]  Dekang Lin,et al.  Phrase Clustering for Discriminative Learning , 2009, ACL.

[19]  Patrick Pantel,et al.  ISP: Learning Inferential Selectional Preferences , 2007, NAACL.

[20]  Grzegorz Kondrak,et al.  Multilingual Cognate Identification using Integer Linear Programming , 2022 .

[21]  Xiaojin Zhu,et al.  --1 CONTENTS , 2006 .

[22]  Graeme Hirst,et al.  Real-Word Spelling Correction with Trigrams: A Reconsideration of the Mays, Damerau, and Mercer Model , 2008, CICLing.

[23]  Graeme Hirst,et al.  Anaphora in Natural Language Understanding: A Survey , 1981, Lecture Notes in Computer Science.

[24]  Eugene Charniak,et al.  Reranking and Self-Training for Parser Adaptation , 2006, ACL.

[25]  Preslav Nakov,et al.  Using the Web as an Implicit Training Set: Application to Structural Ambiguity Resolution , 2005, HLT.

[26]  I. Dan Melamed,et al.  Bitext Maps and Alignment via Pattern Recognition , 1999, CL.

[27]  Eduard H. Hovy,et al.  A Taxonomy, Dataset, and Classifier for Automatic Noun Compound Interpretation , 2010, ACL.

[28]  Michael I. Jordan,et al.  On Discriminative vs. Generative Classifiers: A comparison of logistic regression and naive Bayes , 2001, NIPS.

[29]  Tom M. Mitchell,et al.  Data Analysis Project: Leveraging Massive Textual Corpora Using n-Gram Statistics , 2008 .

[30]  Mats Rooth,et al.  Inducing a Semantically Annotated Lexicon via EM-Based Clustering , 1999, ACL.

[31]  Dale Schuurmans,et al.  Improved Large Margin Dependency Parsing via Local Constraints and Laplacian Regularization , 2006, CoNLL.

[32]  Nathanael Chambers,et al.  Improving the Use of Pseudo-Words for Evaluating Selectional Preferences , 2010, ACL.

[33]  Anastasios Tefas,et al.  Using Support Vector Machines to Enhance the Performance of Elastic Graph Matching for Frontal Face Authentication , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[34]  Mark Steedman,et al.  Last Words: On Becoming a Discipline , 2008, CL.

[35]  Ellen M. Voorhees,et al.  Overview of the TREC 2004 Novelty Track. , 2005 .

[36]  Heng Ji,et al.  New Tools for Web-Scale N-grams , 2010, LREC.

[37]  Thomas Hofmann,et al.  Support vector machine learning for interdependent and structured output spaces , 2004, ICML.

[38]  David Yarowsky,et al.  Unsupervised Word Sense Disambiguation Rivaling Supervised Methods , 1995, ACL.

[39]  Michele Banko,et al.  Scaling to Very Very Large Corpora for Natural Language Disambiguation , 2001, ACL.

[40]  Thorsten Joachims,et al.  Cutting-plane training of structural SVMs , 2009, Machine Learning.

[41]  Colin Cherry,et al.  An Expectation Maximization Approach to Pronoun Resolution , 2005, CoNLL.

[42]  Alon Itai,et al.  Automatic Processing of Large Corpora for the Resolution of Anaphora References , 1990, COLING.

[43]  Koby Crammer,et al.  On the Algorithmic Implementation of Multiclass Kernel-based Vector Machines , 2002, J. Mach. Learn. Res..

[44]  Grzegorz Kondrak,et al.  Evaluation of Several Phonetic Similarity Algorithms on the Task of Cognate Identification , 2006 .

[45]  J. Baker,et al.  The DRAGON system--An overview , 1975 .

[46]  George W. Adamson,et al.  The use of an association measure based on character structure to identify semantically related pairs of words and document titles , 1974, Inf. Storage Retr..

[47]  Peter D. Turney Coherent Keyphrase Extraction via Web Mining , 2003, IJCAI.

[48]  Tong Zhang,et al.  A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data , 2005, J. Mach. Learn. Res..

[49]  Kenneth Ward Church,et al.  Coping with Syntactic Ambiguity or How to Put the Block in the Box on the Table , 1982, CL.

[50]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[51]  Dan Roth,et al.  Constraint Classification for Multiclass Classification and Ranking , 2002, NIPS.

[52]  Mi-Young Kim,et al.  Transliteration Generation and Mining with Limited Training Resources , 2010, NEWS@ACL.

[53]  H. Damasio,et al.  IEEE Transactions on Pattern Analysis and Machine Intelligence: Special Issue on Perceptual Organization in Computer Vision , 1998 .

[54]  Stephen G. Pulman,et al.  Automatically Acquiring Models of Preposition Use , 2007, ACL 2007.

[55]  Eugene Charniak,et al.  Effective Self-Training for Parsing , 2006, NAACL.

[56]  Claire Cardie,et al.  Bootstrapping Coreference Classifiers with Multiple Machine Learning Algorithms , 2003, EMNLP.

[57]  Daphne Koller,et al.  Support Vector Machine Active Learning with Applications to Text Classification , 2000, J. Mach. Learn. Res..

[58]  J. Mathias,et al.  Program , 1970, Symposium on VLSI Technology.

[59]  Shalom Lappin,et al.  An Algorithm for Pronominal Anaphora Resolution , 1994, CL.

[60]  Doug Downey,et al.  Unsupervised named-entity extraction from the Web: An experimental study , 2005, Artif. Intell..

[61]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[62]  Patrick Pantel,et al.  VerbOcean: Mining the Web for Fine-Grained Semantic Verb Relations , 2004, EMNLP.

[63]  Randy Goebel,et al.  Web-Scale N-gram Models for Lexical Disambiguation , 2009, IJCAI.

[64]  James R. Curran,et al.  Adding Noun Phrase Structure to the Penn Treebank , 2007, ACL.

[65]  David Yarowsky,et al.  Minimally Supervised Induction of Grammatical Gender , 2003, HLT-NAACL.

[66]  Margaret Mitchell,et al.  Class-Based Ordering of Prenominal Modifiers , 2009, ENLG.

[67]  Adam Kilgarriff,et al.  Introduction to the Special Issue on the Web as Corpus , 2003, CL.

[68]  Thorsten Joachims,et al.  Training linear SVMs in linear time , 2006, KDD '06.

[69]  Eugene Agichtein,et al.  Factoid Question Answering over Unstructured and Structured Web Content , 2005, TREC.

[70]  Chih-Jen Lin,et al.  Large Linear Classification When Data Cannot Fit in Memory , 2011, TKDD.

[71]  Rob Malouf,et al.  The Order of Prenominal Adjectives in Natural Language Generation , 2000, ACL.

[72]  David Yarowsky,et al.  Augmented Mixture Models for Lexical Disambiguation , 2002, EMNLP.

[73]  John Blitzer,et al.  Biographies, Bollywood, Boom-boxes and Blenders: Domain Adaptation for Sentiment Classification , 2007, ACL.

[74]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[75]  Grzegorz Kondrak,et al.  A Ranking Approach to Stress Prediction for Letter-to-Phoneme Conversion , 2009, ACL/IJCNLP.

[76]  Scott Miller,et al.  Name Tagging with Word Clusters and Discriminative Training , 2004, NAACL.

[77]  Hal Daumé,et al.  Frustratingly Easy Domain Adaptation , 2007, ACL.

[78]  Noah A. Smith,et al.  Contrastive Estimation: Training Log-Linear Models on Unlabeled Data , 2005, ACL.

[79]  Kenneth Ward Church,et al.  Compressing Trigram Language Models With Golomb Coding , 2007, EMNLP.

[80]  Claire Cardie,et al.  Weakly Supervised Natural Language Learning Without Redundant Views , 2003, NAACL.

[81]  Thorsten Brants,et al.  Large Language Models in Machine Translation , 2007, EMNLP.

[82]  Micha Elsner,et al.  EM Works for Pronoun Anaphora Resolution , 2009, EACL.

[83]  Thorsten Joachims,et al.  Transductive Inference for Text Classification using Support Vector Machines , 1999, ICML.

[84]  Sergey Brin,et al.  Extracting Patterns and Relations from the World Wide Web , 1998, WebDB.

[85]  Randy Goebel,et al.  Distributional Identification of Non-Referential Pronouns , 2008, ACL.

[86]  F. Jelinek,et al.  Continuous speech recognition by statistical methods , 1976, Proceedings of the IEEE.

[87]  Kenneth Ward Church,et al.  Introduction to the Special Issue on Computational Linguistics Using Large Corpora , 1993, Comput. Linguistics.

[88]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[89]  Heng Ji,et al.  Gender and Animacy Knowledge Discovery from Web-Scale N-Grams for Unsupervised Person Mention Detection , 2009, PACLIC.

[90]  Donna K. Harman,et al.  The DARPA TIPSTER project , 1992, SIGF.

[91]  Rajat Raina,et al.  Constructing informative priors using transfer learning , 2006, ICML.

[92]  Ido Dagan,et al.  Similarity-Based Models of Word Cooccurrence Probabilities , 1998, Machine Learning.

[93]  Alaa A. Kharbouch,et al.  Three models for the description of language , 1956, IRE Trans. Inf. Theory.

[94]  Peng Xu,et al.  Using a Dependency Parser to Improve SMT for Subject-Object-Verb Languages , 2009, NAACL.

[95]  I. Dan Melamed Manual Annotation of Translational Equivalence , 2001 .

[96]  Lalit R. Bahl,et al.  A Maximum Likelihood Approach to Continuous Speech Recognition , 1983, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[97]  Dan Klein,et al.  Phylogenetic Grammar Induction , 2010, ACL.

[98]  David Yarowsky,et al.  One Sense Per Discourse , 1992, HLT.

[99]  Patrick Pantel,et al.  Espresso: Leveraging Generic Patterns for Automatically Harvesting Semantic Relations , 2006, ACL.

[100]  Michel Simard,et al.  Using cognates to align sentences in bilingual corpora , 1993, TMI.

[101]  Jianfeng Gao,et al.  Using Contextual Speller Techniques and Language Modeling for ESL Error Correction , 2008, IJCNLP.

[102]  Daniel Jurafsky,et al.  Learning Syntactic Patterns for Automatic Hypernym Discovery , 2004, NIPS.

[103]  Peter N. Yianilos,et al.  Learning String-Edit Distance , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[104]  Sanda M. Harabagiu,et al.  RESOLUTION , 1977, Monatsschrift für Kriminologie und Strafrechtsreform.

[105]  Chris Callison-Burch,et al.  Improved Statistical Machine Translation Using Monolingually-Derived Paraphrases , 2009, EMNLP.

[106]  Patrick Pantel,et al.  Discovering word senses from text , 2002, KDD.

[107]  Randy Goebel,et al.  Discriminative Learning of Selectional Preference from Unlabeled Text , 2008, EMNLP.

[108]  Gideon S. Mann,et al.  Simple, robust, scalable semi-supervised learning via expectation regularization , 2007, ICML '07.

[109]  Dale Schuurmans,et al.  Improved Natural Language Learning via Variance-Regularization Support Vector Machines , 2010, CoNLL.

[110]  M. F.,et al.  Bibliography , 1985, Experimental Gerontology.

[111]  Dekang Lin,et al.  Automatic Retrieval and Clustering of Similar Words , 1998, ACL.

[112]  Yoshua Bengio,et al.  Word Representations: A Simple and General Method for Semi-Supervised Learning , 2010, ACL.

[113]  Peter E. Latham,et al.  Mutual Information , 2006 .

[114]  Rosie Jones,et al.  Automatically Building a Corpus for a Minority Language from the Web , 2000, ACL 2000.

[115]  Dan Roth,et al.  Named Entity Transliteration and Discovery from Multilingual Comparable Corpora , 2006 .

[116]  Frederick Jelinek The Dawn of Statistical ASR and MT , 2009, Computational Linguistics.

[117]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[118]  Ryan M. Rifkin,et al.  In Defense of One-Vs-All Classification , 2004, J. Mach. Learn. Res..

[119]  Jeffrey P. Bigham,et al.  Names and Similarities on the Web: Fact Extraction in the Fast Lane , 2006, ACL.

[120]  Thorsten Joachims,et al.  Making large-scale support vector machine learning practical , 1999 .

[121]  Richard Evans,et al.  Applying Machine Learning Toward an Automatic Classification of It , 2001, Lit. Linguistic Comput..

[122]  Dan Klein,et al.  Coreference Resolution in a Modular, Entity-Centered Model , 2010, NAACL.

[123]  Thomas Hofmann,et al.  Large Margin Methods for Structured and Interdependent Output Variables , 2005, J. Mach. Learn. Res..

[124]  Seth Kulick,et al.  Integrated Annotation for Biomedical Information Extraction , 2004, HLT-NAACL 2004.

[125]  Dan Roth,et al.  Learning to Resolve Natural Language Ambiguities: A Unified Approach , 1998, AAAI/IAAI.

[126]  Mick Short,et al.  Using Corpora for Language Research , 1998 .

[127]  Dale Schuurmans,et al.  Semi-Supervised Convex Training for Dependency Parsing , 2008, ACL.

[128]  George A. Miller,et al.  Introduction to WordNet: An On-line Lexical Database , 1990 .

[129]  Nathanael Chambers,et al.  Unsupervised Learning of Narrative Event Chains , 2008, ACL.

[130]  Peter D. Turney Similarity of Semantic Relations , 2006, CL.

[131]  Satoshi Sekine A Linguistic Knowledge Discovery Tool: Very Large Ngram Database Search with Arbitrary Wildcards , 2008, COLING.

[132]  Michael Collins,et al.  Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms , 2002, EMNLP.

[133]  M. Pennacchiotti,et al.  Learning Selectional Preferences for Entailment or Paraphrasing Rules , 2007 .

[134]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[135]  Na-Rae Han,et al.  Detection of Grammatical Errors Involving Prepositions , 2007, ACL 2007.

[136]  Stephen Clark,et al.  Adapting a Lexicalized-Grammar Parser to Contrasting Domains , 2008, EMNLP.

[137]  D. M. Titterington,et al.  Comment on “On Discriminative vs. Generative Classifiers: A Comparison of Logistic Regression and Naive Bayes” , 2008, Neural Processing Letters.

[138]  James R. Curran,et al.  Web Text Corpus for Natural Language Processing , 2006, EACL.

[139]  Dan Roth,et al.  Weakly Supervised Named Entity Transliteration and Discovery from Multilingual Comparable Corpora , 2006, ACL.

[140]  Chih-Jen Lin,et al.  A comparison of methods for multiclass support vector machines , 2002, IEEE Trans. Neural Networks.

[141]  Frederick Jelinek,et al.  Some of my Best Friends are Linguists , 2005, Lang. Resour. Evaluation.

[142]  David Vadas Large-Scale Supervised Models for Noun Phrase Bracketing , 2007 .

[143]  Thorsten Brants,et al.  TnT – A Statistical Part-of-Speech Tagger , 2000, ANLP.

[144]  Mehryar Mohri,et al.  An Efficient Reduction of Ranking to Classification , 2007, COLT.

[145]  Andrew McCallum,et al.  A Conditional Random Field for Discriminatively-trained Finite-state String Edit Distance , 2005, UAI.

[146]  Douglas E. Appelt,et al.  The (Non)Utility of Predicate-Argument Frequencies for Pronoun Interpretation , 2004, NAACL.

[147]  Eduard H. Hovy,et al.  Learning surface text patterns for a Question Answering System , 2002, ACL.

[148]  Martin Chodorow,et al.  The Ups and Downs of Preposition Error Detection in ESL Writing , 2008, COLING.

[149]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[150]  Pascale Fung,et al.  Guest Editors Introduction: Machine Learning in Speech and Language Technologies , 2005, Machine Learning.

[151]  Dekang Lin,et al.  Creating Robust Supervised Classifiers via Web-Scale N-Gram Data , 2010, ACL.

[152]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[153]  Christine D. Piatko,et al.  Using “Annotator Rationales” to Improve Machine Learning for Text Categorization , 2007, NAACL.

[154]  Daniel M. Bikel,et al.  Intricacies of Collins’ Parsing Model , 2004, CL.

[155]  Axthonv G. Oettinger,et al.  IEEE Transactions on Information Theory , 1998 .

[156]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[157]  Su-Youn Yoon,et al.  Multilingual Transliteration Using Feature based Phonetic Method , 2007, ACL.

[158]  Matthew Haines,et al.  Filling Knowledge Gaps in a Broad-Coverage Machine Translation System , 1995, IJCAI.

[159]  Eduard H. Hovy,et al.  Offline Strategies for Online Question Answering: Answering Questions Before They Are Asked , 2003, ACL.

[160]  John Hale,et al.  A Statistical Approach to Anaphora Resolution , 1998, VLC@COLING/ACL.

[161]  Daniel Marcu,et al.  Cognates Can Improve Statistical Translation Models , 2003, NAACL.

[162]  Jerry R. Hobbs Resolving pronoun references , 1986 .

[163]  Jun'ichi Tsujii,et al.  A discriminative language model with pseudo-negative samples , 2007, ACL.

[164]  David Yarowsky,et al.  DECISION LISTS FOR LEXICAL AMBIGUITY RESOLUTION: Application to Accent Restoration in Spanish and French , 1994, ACL.

[165]  C. D. Paice,et al.  Towards the automatic recognition of anaphoric features in English text: the impersonal pronoun “it” , 1987 .

[166]  Christoph Müller,et al.  Automatic Detection of Nonreferential It in Spoken Multi-Party Dialog , 2006, EACL.

[167]  C. Phillips,et al.  Journal of Memory and Language , 2001 .

[168]  V. M. Holmes,et al.  Lexical Expectations in Parsing Complement-Verb Sentences , 1989 .

[169]  Hermann Ney,et al.  Discriminative Training and Maximum Entropy Models for Statistical Machine Translation , 2002, ACL.

[170]  Ellen Riloff,et al.  Learning Dictionaries for Information Extraction by Multi-Level Bootstrapping , 1999, AAAI/IAAI.

[171]  Susan Brewer,et al.  Information storage and retrieval , 1959, ACM '59.

[172]  Grzegorz Kondrak Cognates and Word Alignment in Bitexts , 2005, MTSUMMIT.

[173]  Peter D. Turney Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews , 2002, ACL.

[174]  Eric Brill,et al.  An Improved Error Model for Noisy Channel Spelling Correction , 2000, ACL.

[175]  Jimmy J. Lin,et al.  Data-Intensive Question Answering , 2001, TREC.

[176]  David Yarowsky,et al.  Multipath Translation Lexicon Induction via Bridge Languages , 2001, NAACL.

[177]  Philipp Koehn,et al.  Learning a Translation Lexicon from Monolingual Corpora , 2002, ACL 2002.

[178]  Philipp Koehn,et al.  Europarl: A Parallel Corpus for Statistical Machine Translation , 2005, MTSUMMIT.

[179]  Luis Gravano,et al.  Snowball: extracting relations from large plain-text collections , 2000, DL '00.

[180]  Mirella Lapata,et al.  Aggregation via Set Partitioning for Natural Language Generation , 2006, NAACL.

[181]  Chung-Hsien Wu,et al.  OntoNotes: Sense Pool Verification Using Google N-gram and Statistical Tests , 2007 .

[182]  F ChenStanley,et al.  An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.

[183]  Dale Schuurmans,et al.  Strictly Lexical Dependency Parsing , 2005, IWPT.

[184]  Jianfeng Gao,et al.  A Web-based English Proofing System for English as a Second Language Users , 2008, IJCNLP.

[185]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[186]  Eva Hajicová,et al.  Some of Our Best Friends Are Statisticians , 2007, TSD.

[187]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[188]  Jason Weston,et al.  Multi-Class Support Vector Machines , 1998 .

[189]  Dekang Lin,et al.  Bootstrapping Path-Based Pronoun Resolution , 2006, ACL.

[190]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[191]  Chris Callison-Burch,et al.  Bucking the Trend: Large-Scale Cost-Focused Active Learning for Statistical Machine Translation , 2010, ACL.

[192]  Antal van den Bosch All-word Prediction as the Ultimate Confusible Disambiguation , 2006 .

[193]  Kenneth Ward Church Char_align: A Program for Aligning Parallel Texts at the Character Level , 1993, ACL.

[194]  David Yarowsky,et al.  Inducing Translation Lexicons via Diverse Similarity Measures and Bridge Languages , 2002, CoNLL.

[195]  Frank Keller,et al.  Using the Web to Obtain Frequencies for Unseen Bigrams , 2003, CL.

[196]  Mitchell P. Marcus,et al.  OntoNotes: The 90% Solution , 2006, NAACL.

[197]  Qin Iris Wang,et al.  Learning Noun Phrase Query Segmentation , 2007, EMNLP.

[198]  Raymond J. Mooney,et al.  Adaptive duplicate detection using learnable string similarity measures , 2003, KDD '03.

[199]  David J. Weir,et al.  Co-occurrence Retrieval: A Flexible Framework for Lexical Distributional Similarity , 2005, CL.

[200]  David Yarowsky,et al.  Language Independent Named Entity Recognition Combining Morphological and Contextual Evidence , 1999, EMNLP.

[201]  Shane Bergsma,et al.  Automatic Answer Typing for How-Questions , 2007, HLT-NAACL.

[202]  Shane Bergsma,et al.  Automatic Acquisition of Gender Information for Anaphora Resolution , 2005, Canadian Conference on AI.

[203]  Rosie Jones,et al.  The Linguistic Structure of English Web-Search Queries , 2008, EMNLP.

[204]  Philipp Koehn,et al.  Manual and Automatic Evaluation of Machine Translation between European Languages , 2006, WMT@HLT-NAACL.

[205]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[206]  Rada Mihalcea,et al.  A Method for Word Sense Disambiguation of Unrestricted Text , 1999, ACL.

[207]  James H. Martin,et al.  Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition, 2nd Edition , 2000, Prentice Hall series in artificial intelligence.

[208]  Hua He,et al.  Predicting the Semantic Compositionality of Prefix Verbs , 2010, EMNLP.

[209]  Marti A. Hearst Automatic Acquisition of Hyponyms from Large Text Corpora , 1992, COLING.

[210]  P. Resnik Selectional constraints: an information-theoretic model and its computational realization , 1996, Cognition.

[211]  Jörg Tiedemann,et al.  Automatic Construction of Weighted String Similarity Measures , 1999, EMNLP.

[212]  Dmitry Zelenko,et al.  Discriminative Methods for Transliteration , 2006, EMNLP.

[213]  Viktor Pekar,et al.  Automatic Detection of Orthographics Cues for Cognate Recognition , 2006, LREC.

[214]  Mirella Lapata,et al.  Evaluating and Combining Approaches to Selectional Preference Acquisition , 2003, EACL.

[215]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[216]  Grzegorz Kondrak,et al.  Alignment-Based Discriminative String Similarity , 2007, ACL.

[217]  Ioannis Pitas,et al.  Novel Multiclass Classifiers Based on the Minimization of the Within-Class Variance , 2009, IEEE Transactions on Neural Networks.

[218]  Patrick Pantel,et al.  Clustering by committee , 2003 .

[219]  Nicoletta Ide Nancy Calzolari,et al.  Language Resources and Evaluation , 1966 .

[220]  Koby Crammer,et al.  Ultraconservative Online Algorithms for Multiclass Problems , 2001, J. Mach. Learn. Res..

[221]  Robert L. Mercer,et al.  Class-Based n-gram Models of Natural Language , 1992, CL.

[222]  Mary Gardiner,et al.  Practical Queries of a Massive n-gram Database , 2007, ALTA.

[223]  Deniz Yuret,et al.  KU: Word Sense Disambiguation by Substitution , 2007, Fourth International Workshop on Semantic Evaluations (SemEval-2007).

[224]  Brendan T. O'Connor,et al.  Cheap and Fast – But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks , 2008, EMNLP.

[225]  Dekang Lin,et al.  Dependency-Based Evaluation of Minipar , 2003 .

[226]  John Blitzer,et al.  Distributed Latent Variable Models of Lexical Co-occurrences , 2005, AISTATS.

[227]  Dan Klein,et al.  Prototype-Driven Learning for Sequence Models , 2006, NAACL.

[228]  Daniel Gildea,et al.  Corpus Variation and Parser Performance , 2001, EMNLP.

[229]  Jian Su,et al.  Improving Pronoun Resolution Using Statistics-Based Semantic Compatibility Information , 2005, ACL.

[230]  Gregory Grefenstette,et al.  The World Wide Web as a Resource for Example-Based Machine Translation Tasks , 1999, TC.

[231]  Kenneth C. Litkowski,et al.  SemEval-2007 Task 06: Word-Sense Disambiguation of Prepositions , 2007, Fourth International Workshop on Semantic Evaluations (SemEval-2007).

[232]  Edward Vanhoutte Literary and Linguistic Computing , 1986 .

[233]  Dragos Stefan Munteanu,et al.  Improving Machine Translation Performance by Exploiting Non-Parallel Corpora , 2005, CL.

[234]  Alexandra L. Uitdenbogerd Readability of French as a foreign language and its uses , 2005 .

[235]  Ellen Riloff,et al.  Unsupervised Learning of Contextual Role Knowledge for Coreference Resolution , 2004, NAACL.

[236]  James R. Curran,et al.  Classification of Verb Particle Constructions with the Google Web1T Corpus , 2008, ALTA.

[237]  Preslav Nakov,et al.  Search Engine Statistics Beyond the n-Gram: Application to Noun Compound Bracketing , 2005, CoNLL.

[238]  Jun Suzuki,et al.  Semi-Supervised Sequential Labeling and Segmentation Using Giga-Word Scale Unlabeled Data , 2008, ACL.

[239]  Eugene Charniak,et al.  Automatic Domain Adaptation for Parsing , 2010, NAACL.

[240]  E. Todeva Networks , 2007 .

[241]  Adam Kilgarriff Googleology is Bad Science , 2007, Computational Linguistics.

[242]  Michael Strube,et al.  The Influence of Minimum Edit Distance on Reference Resolution , 2002, EMNLP.

[243]  Ben Taskar,et al.  A Discriminative Matching Approach to Word Alignment , 2005, HLT.

[244]  Graeme Hirst,et al.  Correcting real-word spelling errors by restoring lexical cohesion , 2005, Natural Language Engineering.

[245]  Adam L. Berger,et al.  A Maximum Entropy Approach to Natural Language Processing , 1996, CL.

[246]  Dan Roth,et al.  A Classification Approach to Word Prediction , 2000, ANLP.

[247]  Hang Li,et al.  Generalizing Case Frames Using a Thesaurus and the MDL Principle , 1995, CL.

[248]  Mark Lauer,et al.  Corpus Statistics Meet the Noun Compound: Some Empirical Results , 1995, ACL.

[249]  Dan Roth,et al.  A Linear Programming Formulation for Global Inference in Natural Language Tasks , 2004, CoNLL.

[250]  Randy Goebel,et al.  Glen, Glenda or Glendale: Unsupervised and Semi-supervised Learning of English Noun Gender , 2009, CoNLL.

[251]  David A. Cohn,et al.  Improving generalization with active learning , 1994, Machine Learning.

[252]  Dragomir R. Radev,et al.  Mining the web for answers to natural language questions , 2001, CIKM '01.

[253]  Dan Roth,et al.  A Winnow-Based Approach to Context-Sensitive Spelling Correction , 1998, Machine Learning.

[254]  Nello Cristianini,et al.  Classification using String Kernels , 2000 .

[255]  Malvina Nissim,et al.  Using the Web in Machine Learning for Other-Anaphora Resolution , 2003, EMNLP.

[256]  Alexander Yates,et al.  Distributional Representations for Handling Sparsity in Supervised Sequence-Labeling , 2009, ACL.