Intégration de ressources lexicales riches dans un analyseur syntaxique probabiliste. (Integration of lexical resources in a probabilistic parser)

Cette these porte sur l'integration de ressources lexicales et syntaxiques du francais dans deux tâches fondamentales du Traitement Automatique des Langues [TAL] que sont l'etiquetage morpho-syntaxique probabiliste et l'analyse syntaxique probabiliste. Dans ce memoire, nous utilisons des donnees lexicales et syntaxiques creees par des processus automatiques ou par des linguistes afin de donner une reponse a deux problematiques que nous decrivons succinctement ci-dessous : la dispersion des donnees et la segmentation automatique des textes. Grâce a des algorithmes d'analyse syntaxique de plus en plus evolues, les performances actuelles des analyseurs sont de plus en plus elevees, et ce pour de nombreuses langues dont le francais. Cependant, il existe plusieurs problemes inherents aux formalismes mathematiques permettant de modeliser statistiquement cette tâche (grammaire, modeles discriminants,...). La dispersion des donnees est l'un de ces problemes, et est causee principalement par la faible taille des corpus annotes disponibles pour la langue. La dispersion represente la difficulte d'estimer la probabilite de phenomenes syntaxiques apparaissant dans les textes a analyser mais qui sont rares ou absents du corpus ayant servi a l'apprentissage des analyseurs. De plus, il est prouve que la dispersion est en partie un probleme lexical, car plus la flexion d'une langue est importante, moins les phenomenes lexicaux sont representes dans les corpus annotes. Notre premiere problematique repose donc sur l'attenuation de l'effet negatif de la dispersion lexicale des donnees sur les performances des analyseurs. Dans cette optique, nous nous sommes interesse a une methode appelee regroupement lexical, et qui consiste a regrouper les mots du corpus et des textes en classes. Ces classes reduisent le nombre de mots inconnus et donc le nombre de phenomenes syntaxiques rares ou inconnus, lies au lexique, des textes a analyser. Notre objectif est donc de proposer des regroupements lexicaux a partir d'informations tirees des lexiques syntaxiques du francais, et d'observer leur impact sur les performances d'analyseurs syntaxiques. Par ailleurs, la plupart des evaluations concernant l'etiquetage morpho-syntaxique probabiliste et l'analyse syntaxique probabiliste ont ete realisees avec une segmentation parfaite du texte, car identique a celle du corpus evalue. Or, dans les cas reels d'application, la segmentation d'un texte est tres rarement disponible et les segmenteurs automatiques actuels sont loin de proposer une segmentation de bonne qualite, et ce, a cause de la presence de nombreuses unites multi-mots (mots composes, entites nommees,...). Dans ce memoire, nous nous focalisons sur les unites multi-mots dites continues qui forment des unites lexicales auxquelles on peut associer une etiquette morpho-syntaxique, et que nous appelons mots composes. Par exemple, cordon bleu est un nom compose, et tout a fait un adverbe compose. Nous pouvons assimiler la tâche de reperage des mots composes a celle de la segmentation du texte. Notre deuxieme problematique portera donc sur la segmentation automatique des textes francais et son impact sur les performances des processus automatiques. Pour ce faire, nous nous sommes penche sur une approche consistant a coupler, dans un meme modele probabiliste, la reconnaissance des mots composes et une autre tâche automatique. Dans notre cas, il peut s'agir de l'analyse syntaxique ou de l'etiquetage morpho-syntaxique. La reconnaissance des mots composes est donc realisee au sein du processus probabiliste et non plus dans une phase prealable. Notre objectif est donc de proposer des strategies innovantes permettant d'integrer des ressources de mots composes dans deux processus probabilistes combinant l'etiquetage ou l'analyse a la segmentation du texte

[1]  Patrick Paroubek,et al.  PASSAGE: from French Parser Evaluation to Large Sized Treebank , 2008, LREC.

[2]  Dan Klein,et al.  Learning Accurate, Compact, and Interpretable Tree Annotation , 2006, ACL.

[3]  Anoop Sarkar,et al.  Applying Co-Training Methods to Statistical Parsing , 2001, NAACL.

[4]  Andrés Marzal,et al.  Computation of the N Best Parse Trees for Weighted and Stochastic Context-Free Grammars , 2000, SSPR/SPR.

[5]  Thorsten Brants,et al.  TnT – A Statistical Part-of-Speech Tagger , 2000, ANLP.

[6]  Yannick Versley,et al.  Scalable Discriminative Parsing for German , 2009, IWPT.

[7]  Benoît Sagot,et al.  The Lefff, a Freely Available and Large-coverage Morphological and Syntactic Lexicon for French , 2010, LREC.

[8]  Christopher D. Manning,et al.  Enriching the Knowledge Sources Used in a Maximum Entropy Part-of-Speech Tagger , 2000, EMNLP.

[9]  Scott Miller,et al.  Name Tagging with Word Clusters and Discriminative Training , 2004, NAACL.

[10]  Eugene Charniak,et al.  Reranking and Self-Training for Parser Adaptation , 2006, ACL.

[11]  Miles Osborne,et al.  Shallow Parsing using Noisy and Non-Stationary Training Material , 2002, J. Mach. Learn. Res..

[12]  Alon Lavie,et al.  A Best-First Probabilistic Shift-Reduce Parser , 2006, ACL.

[13]  Adwait Ratnaparkhi,et al.  Learning to Parse Natural Language with Maximum Entropy Models , 1999, Machine Learning.

[14]  Ralph Grishman,et al.  A Procedure for Quantitatively Comparing the Syntactic Coverage of English Grammars , 1991, HLT.

[15]  Christopher D. Manning,et al.  Joint Parsing and Named Entity Recognition , 2009, NAACL.

[16]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[17]  Piet Mertens,et al.  La valence: l'approche pronominale et son application au lexique verbal , 2003 .

[18]  Atro Voutilainen,et al.  A language-independent system for parsing unrestricted text , 1995 .

[19]  André Lentin Danlos, Laurence. 1985. Generation automatique de textes en langues naturelles , 1986 .

[20]  Josef van Genabith,et al.  Handling Unknown Words in Statistical Latent-Variable Parsing Models for Arabic, English and French , 2010, SPMRL@NAACL-HLT.

[21]  Timothy Baldwin,et al.  Multiword Expressions: A Pain in the Neck for NLP , 2002, CICLing.

[22]  David J. Spiegelhalter,et al.  Probabilistic Networks and Expert Systems , 1999, Information Science and Statistics.

[23]  Eugene Charniak,et al.  A Maximum-Entropy-Inspired Parser , 2000, ANLP.

[24]  Matthieu Constant,et al.  A generic tool to generate a lexicon for NLP from Lexicon-Grammar tables , 2010, ArXiv.

[25]  Piet Mertens Restrictions de sélection et réalisations syntagmatiques dans DICOVALENCE: conversion vers un format utilisable en TAL , 2010 .

[26]  Josef van Genabith,et al.  Decreasing Lexical Data Sparsity in Statistical Syntactic Parsing - Experiments with Named Entities , 2011, MWE@ACL.

[27]  Ted Briscoe,et al.  Robust stochastic parsing using the inside-outside algorithm , 1994, ArXiv.

[28]  Wojciech Skut,et al.  An Annotation Scheme for Free Word Order Languages , 1997, ANLP.

[29]  Alexis Nasr,et al.  Modèles génératif et discriminant en analyse syntaxique : expériences sur le corpus arboré de Paris 7 (Generative and discriminative models in parsing: experiments on the Paris 7 Treebank) , 2011, JEPTALNRECITAL.

[30]  F ROSENBLATT,et al.  The perceptron: a probabilistic model for information storage and organization in the brain. , 1958, Psychological review.

[31]  Brian Roark,et al.  Incremental Parsing with the Perceptron Algorithm , 2004, ACL.

[32]  Eugene Charniak,et al.  Statistical Parsing with a Context-Free Grammar and Word Statistics , 1997, AAAI/IAAI.

[33]  Xavier Carreras,et al.  Experiments with a Higher-Order Projective Dependency Parser , 2007, EMNLP.

[34]  Daniel M. Bikel,et al.  Intricacies of Collins’ Parsing Model , 2004, CL.

[35]  Joakim Nivre,et al.  Inductive Dependency Parsing (Text, Speech and Language Technology) , 2006 .

[36]  Andrew J. Viterbi,et al.  Error bounds for convolutional codes and an asymptotically optimum decoding algorithm , 1967, IEEE Trans. Inf. Theory.

[37]  Eugene Charniak,et al.  Effective Self-Training for Parsing , 2006, NAACL.

[38]  Sabine Brants,et al.  The TIGER Treebank , 2001 .

[39]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[40]  Denis Maurel,et al.  The Prolex Data Base: Toponyms and Gentiles for NLP , 1998 .

[41]  Alon Lavie,et al.  A Classifier-Based Parser with Linear Run-Time Complexity , 2005, IWPT.

[42]  Anthony Sigogne HybridTagger : un étiqueteur hybride pour le Français , 2010 .

[43]  Stefan Riezler,et al.  Incremental Feature Selection and l1 Regularization for Relaxed Maximum-Entropy Modeling , 2004, EMNLP.

[44]  Frank Keller,et al.  Lexicalization in Crosslinguistic Probabilistic Parsing: The Case of French , 2005, ACL.

[45]  Sandra Kübler How Do Treebank Annotation Schemes Influence Parsing Results? Or How Not to Compare Apples And Oranges , 2005 .

[46]  Hermann Ney,et al.  Dynamic programming parsing for context-free grammars in continuous speech recognition , 1991, IEEE Trans. Signal Process..

[47]  Christer Samuelsson,et al.  Morphological Tagging Based Entirely on Bayesian Inference , 1993, NODALIDA.

[48]  Michael Collins,et al.  Discriminative Reranking for Natural Language Parsing , 2000, CL.

[49]  Maurice Gross,et al.  Lexicon - Grammar The Representation of Compound Words , 1986, COLING.

[50]  Brian Roark,et al.  MAP adaptation of stochastic grammars , 2006, Comput. Speech Lang..

[51]  M. A. R T A P A L,et al.  The Penn Chinese TreeBank: Phrase structure annotation of a large corpus , 2005, Natural Language Engineering.

[52]  Marie Candito,et al.  Expériences d’analyse syntaxique statistique du français , 2008, JEPTALNRECITAL.

[53]  Detlef Prescher,et al.  Head-Driven PCFGs with Latent-Head Statistics , 2005, IWPT.

[54]  Yuji Matsumoto,et al.  Statistical Dependency Analysis with Support Vector Machines , 2003, IWPT.

[55]  Liang Huang,et al.  Forest Reranking: Discriminative Parsing with Non-Local Features , 2008, ACL.

[56]  Didier Bourigault,et al.  Acquisition et évaluation sur corpus de propriétés de sous-catégorisation syntaxique , 2005, JEPTALNRECITAL.

[57]  Christopher D. Manning,et al.  Multiword Expression Identification with Tree Substitution Grammars: A Parsing tour de force with French , 2011, EMNLP.

[58]  Mark Johnson,et al.  Reranking the Berkeley and Brown Parsers , 2010, HLT-NAACL.

[59]  Christopher D. Manning,et al.  Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling , 2005, ACL.

[60]  John A. Carroll,et al.  The Automatic Acquisition of Verb Subcategorisations and Their Impact on the Performance of an HPSG Parser , 2004, IJCNLP.

[61]  Max Silberztein,et al.  INTEX: An FST Toolbox , 2000, Theor. Comput. Sci..

[62]  Lonneke van der Plas,et al.  Domain Adaptation with Artificial Data for Semantic Parsing of Speech , 2009, HLT-NAACL.

[63]  Ari Rappoport,et al.  Self-Training for Enhancement and Domain Adaptation of Statistical Parsers Trained on Small Datasets , 2007, ACL.

[64]  Gaël Dias,et al.  Multiword Unit Hybrid Extraction , 2003, ACL 2003.

[65]  Eric Laporte,et al.  Elimination of lexical ambiguities by grammars: The ELAG system , 2000 .

[66]  Marie Candito,et al.  Adaptation de parsers statistiques lexicalisés pour le français : Une évaluation complète sur corpus arborés , 2009, JEPTALNRECITAL.

[67]  George Karypis,et al.  A Comparison of Document Clustering Techniques , 2000 .

[68]  Pascal Denis,et al.  Analyse syntaxique du français : des constituants aux dépendances , 2009 .

[69]  Yoram Singer,et al.  An Efficient Boosting Algorithm for Combining Preferences by , 2013 .

[70]  Giorgio Satta,et al.  Guided Learning for Bidirectional Sequence Classification , 2007, ACL.

[71]  Rob Malouf,et al.  A Comparison of Algorithms for Maximum Entropy Parameter Estimation , 2002, CoNLL.

[72]  Josef van Genabith,et al.  Evaluating Evaluation Measures , 2007, NODALIDA.

[73]  Michael Collins,et al.  A New Statistical Parser Based on Bigram Lexical Dependencies , 1996, ACL.

[74]  Lluís Màrquez i Villodre,et al.  SVMTool: A general POS Tagger Generator Based on Support Vector Machines , 2004, LREC.

[75]  Isabelle Tellier,et al.  Champs Markoviens Conditionnels pour l'extraction d'information , 2011 .

[76]  Sabine Buchholz,et al.  CoNLL-X Shared Task on Multilingual Dependency Parsing , 2006, CoNLL.

[77]  Ingrid Falk,et al.  Extraction d'information de sous-catégorisation à partir des tables du LADL , 2006 .

[78]  Eric Wehrli,et al.  Extraction of multi-word collocations using syntactic bigram composition , 2003 .

[79]  Matthieu Constant,et al.  MWU-Aware Part-of-Speech Tagging with a CRF Model and Lexical Resources , 2011, MWE@ACL.

[80]  Adwait Ratnaparkhi,et al.  A Linear Observed Time Statistical Parser Based on Maximum Entropy Models , 1997, EMNLP.

[81]  Jan Hajic,et al.  Morphological Tagging: Data vs. Dictionaries , 2000, ANLP.

[82]  Eugene Charniak,et al.  Coarse-to-Fine n-Best Parsing and MaxEnt Discriminative Reranking , 2005, ACL.

[83]  Claire Cardie,et al.  Limitations of Co-Training for Natural Language Learning from Large Datasets , 2001, EMNLP.

[84]  Mitchell P. Marcus,et al.  OntoNotes: The 90% Solution , 2006, NAACL.

[85]  Slav Petrov,et al.  Products of Random Latent Variable Grammars , 2010, NAACL.

[86]  Julie Weeds,et al.  Finding Predominant Word Senses in Untagged Text , 2004, ACL.

[87]  Koby Crammer,et al.  Online Large-Margin Training of Dependency Parsers , 2005, ACL.

[88]  Daniel Gildea,et al.  Corpus Variation and Parser Performance , 2001, EMNLP.

[89]  Matthew Lease,et al.  Parsing Biomedical Literature , 2005, IJCNLP.

[90]  Andrew McCallum,et al.  Maximum Entropy Markov Models for Information Extraction and Segmentation , 2000, ICML.

[91]  Anna Korhonen,et al.  Statistical Filtering and Subcategorization Frame Acquisition , 2000, EMNLP.

[92]  Christiane Fellbaum,et al.  Building Semantic Concordances , 1998 .

[93]  Eugene Charniak,et al.  When is Self-Training Effective for Parsing? , 2008, COLING.

[94]  Christopher Joseph Pal,et al.  Sparse Forward-Backward Using Minimum Divergence Beams for Fast Training Of Conditional Random Fields , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[95]  Eric Brill,et al.  Transformation-Based Error-Driven Learning and Natural Language Processing: A Case Study in Part-of-Speech Tagging , 1995, CL.

[96]  Daniel Jurafsky,et al.  Parsing to Stanford Dependencies: Trade-offs between Speed and Accuracy , 2010, LREC.

[97]  Josef van Genabith,et al.  Preparing, restructuring, and augmenting a French treebank:lexicalised parsers or coherent treebanks? , 2007 .

[98]  Meng Zhang,et al.  Refining Grammars for Parsing with Hierarchical Semantic Knowledge , 2009, EMNLP.

[99]  Pascal Denis,et al.  Statistical French Dependency Parsing: Treebank Conversion and First Results , 2010, LREC.

[100]  Andrew McCallum,et al.  An Introduction to Conditional Random Fields for Relational Learning , 2007 .

[101]  Slav Petrov,et al.  Uptraining for Accurate Deterministic Question Parsing , 2010, EMNLP.

[102]  Patrick Watrin,et al.  Discriminative Strategies to Integrate Multiword Expression Recognition and Parsing , 2012, ACL.

[103]  M. Gross The Construction of Local Grammars , 1997 .

[104]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[105]  David Yarowsky,et al.  Unsupervised Word Sense Disambiguation Rivaling Supervised Methods , 1995, ACL.

[106]  Satoshi Sekine,et al.  The Domain Dependence of Parsing , 1997, ANLP.

[107]  J. Baker Trainable grammars for speech recognition , 1979 .

[108]  Timothy Baldwin,et al.  Improving Parsing and PP Attachment Performance with Sense Information , 2008, ACL.

[109]  Marie Candito,et al.  Improving generative statistical parsing with semi-supervised word clustering , 2009, IWPT.

[110]  Martha Palmer,et al.  Handling Structural Divergences and Recovering Dropped Arguments in a Korean / English Machine Translation System ? , 2000 .

[111]  Daniel M. Bikel A Statistical Model for Parsing and Word-Sense Disambiguation , 2000, EMNLP.

[112]  Josef van Genabith,et al.  Lemmatization and Lexicalized Statistical Parsing of Morphologically-Rich Languages: the Case of French , 2010, SPMRL@NAACL-HLT.

[113]  Fernando Pereira,et al.  Discriminative learning and spanning tree algorithms for dependency parsing , 2006 .

[114]  Geoffrey Sampson,et al.  A test of the leaf-ancestor metric for parse accuracy , 2003, Natural Language Engineering.

[115]  Miles Osborne,et al.  Using maximum entropy for sentence extraction , 2002, ACL 2002.

[116]  B. Daille Repérage et extraction de terminologie par une approche mixte statistique et linguistique , 1995 .

[117]  Dekang Lin,et al.  Automatic Retrieval and Clustering of Similar Words , 1998, ACL.

[118]  James R. Curran,et al.  Bootstrapping POS-taggers using unlabelled data , 2003, CoNLL.

[119]  Erhard W. Hinrichs,et al.  The Tüba-D/Z Treebank: Annotating German with a Context-Free Backbone , 2004, LREC.

[120]  Dan Klein,et al.  Accurate Unlexicalized Parsing , 2003, ACL.

[121]  Joakim Nivre,et al.  Algorithms for Deterministic Incremental Dependency Parsing , 2008, CL.

[122]  J. Darroch,et al.  Generalized Iterative Scaling for Log-Linear Models , 1972 .

[123]  Mats Rooth,et al.  Smoothing fine-grained PCFG lexicons , 2009, IWPT.

[124]  Patrick Watrin,et al.  La reconnaissance des mots composés à l'épreuve de l'analyse syntaxique et vice-versa : évaluation de deux stratégies discriminantes , 2012 .

[125]  Sylvain Kahane,et al.  Grammaires de dŽpendance formelles et thŽorie Sens-Texte , 2001, JEPTALNRECITAL.

[126]  Yoram Singer,et al.  Unsupervised Models for Named Entity Classification , 1999, EMNLP.

[127]  Patrick Watrin,et al.  An N-gram Frequency Database Reference to Handle MWE Extraction in NLP Applications , 2011, MWE@ACL.

[128]  Eric Wehrli,et al.  Sentence Analysis and Collocation Identification , 2010, MWE@COLING.

[129]  Lide Wu,et al.  A Fast Algorithm for Feature Selection in Conditional Maximum Entropy Modeling , 2003, EMNLP.

[130]  Vladimir Solmon,et al.  The estimation of stochastic context-free grammars using the Inside-Outside algorithm , 2003 .

[131]  Tianshun Yao,et al.  Applying Conditional Random Fields to Chinese Shallow Parsing , 2005, CICLing.

[132]  E. Jaynes Information Theory and Statistical Mechanics , 1957 .

[133]  Joakim Nivre,et al.  Multiword Units in Syntactic Parsing , 2004 .

[134]  Mary P. Harper,et al.  Self-Training PCFG Grammars with Latent Annotations Across Languages , 2009, EMNLP.

[135]  Mark Hopkins,et al.  Cube Pruning as Heuristic Search , 2009, EMNLP.

[136]  Sandra Kübler The PaGe 2008 Shared Task on Parsing German , 2008 .

[137]  Mark Johnson,et al.  PCFG Models of Linguistic Tree Representations , 1998, CL.

[138]  Ted Briscoe,et al.  Parser evaluation: a survey and a new proposal , 1998, LREC.

[139]  Taylor L. Booth,et al.  Probabilistic Representation of Formal Languages , 1969, SWAT.

[140]  Wolfgang Maier,et al.  Annotation Schemes and their Influence on Parsing Results , 2006, ACL.

[141]  Koby Crammer,et al.  Ultraconservative Online Algorithms for Multiclass Problems , 2001, J. Mach. Learn. Res..

[142]  Dan Klein,et al.  Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network , 2003, NAACL.

[143]  Tadao Kasami,et al.  An Efficient Recognition and Syntax-Analysis Algorithm for Context-Free Languages , 1965 .

[144]  Xavier Carreras,et al.  An Empirical Study of Semi-supervised Structured Conditional Models for Dependency Parsing , 2009, EMNLP.

[145]  David Chiang,et al.  Better k-best Parsing , 2005, IWPT.

[146]  Marie Candito,et al.  Parsing Word Clusters , 2010, SPMRL@NAACL-HLT.

[147]  Jay Earley,et al.  An efficient context-free parsing algorithm , 1970, Commun. ACM.

[148]  Benoît Sagot,et al.  Exploitation d’une ressource lexicale pour la construction d’un étiqueteur morpho-syntaxique état-de-l’art du français , 2010, JEPTALNRECITAL.

[149]  David Maxwell Chickering,et al.  Dependency Networks for Inference, Collaborative Filtering, and Data Visualization , 2000, J. Mach. Learn. Res..

[150]  Percy Liang,et al.  Semi-Supervised Learning for Natural Language , 2005 .

[151]  Marie Candito,et al.  A Word Clustering Approach to Domain Adaptation: Effective Parsing of Biomedical Texts , 2011, IWPT.

[152]  François Yvon,et al.  Practical Very Large Scale CRFs , 2010, ACL.

[153]  D. Bourigault,et al.  Syntex, analyseur syntaxique de corpus , 2005 .

[154]  Nabil Hathout,et al.  Automatic construction and validation of French large lexical resources. Reuse of verb theoretical linguistic descriptions , 1998, LREC.

[155]  Christopher D. Manning,et al.  Generating Typed Dependency Parses from Phrase Structure Parses , 2006, LREC.

[156]  Josef van Genabith,et al.  Adapting WSJ-Trained Parsers to the British National Corpus using In-Domain Self-Training , 2007, IWPT.

[157]  Anna Kupsc Extraction automatique de cadres de sous-catégorisation verbale pour le français à partir d’un corpus arboré , 2007, JEPTALNRECITAL.

[158]  Marie Candito,et al.  Cross parser evaluation and tagset variation: a French treebank study , 2009 .

[159]  Andrew McCallum,et al.  Efficiently Inducing Features of Conditional Random Fields , 2002, UAI.

[160]  John D. Lafferty,et al.  Inducing Features of Random Fields , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[161]  Fernando Pereira,et al.  Multilingual Dependency Analysis with a Two-Stage Discriminative Parser , 2006, CoNLL.

[162]  Dan Klein,et al.  Improved Inference for Unlexicalized Parsing , 2007, NAACL.

[163]  Phil Blunsom,et al.  Inducing Compact but Accurate Tree-Substitution Grammars , 2009, NAACL.

[164]  Cédric Messiant,et al.  A Subcategorization Acquisition System for French Verbs , 2008, ACL.

[165]  Patrick Watrin,et al.  Networking Multiword Units , 2008, GoTAL.

[166]  David M. Magerman Statistical Decision-Tree Models for Parsing , 1995, ACL.

[167]  Ronald M. Kaplan,et al.  Lexical Functional Grammar A Formal System for Grammatical Representation , 2004 .

[168]  David J. Spiegelhalter,et al.  Probabilistic Networks and Expert Systems - Exact Computational Methods for Bayesian Networks , 1999, Information Science and Statistics.

[169]  Richard M. Schwartz,et al.  Coping with Ambiguity and Unknown Words through Probabilistic Models , 1993, CL.

[170]  Michael Collins,et al.  Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms , 2002, EMNLP.

[171]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[172]  Alexandra Kinyon,et al.  Building a Treebank for French , 2000, LREC.

[173]  C. Blanche-Benveniste,et al.  Syntaxe et Mécanismes Descriptifs: Présentation de l'approche pronominale , 1978 .

[174]  Sébastien Paumier,et al.  De la reconnaissance des formes linguistiques à l'analyse syntaxique , 2003 .

[175]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[176]  Atro Voutilainen A syntax-based part-of-speech analyser , 1995, EACL.

[177]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[178]  Andrew S. Gordon,et al.  Clustering Words by Syntactic Similarity improves Dependency Parsing of Predicate-argument Structures , 2009, IWPT.

[179]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[180]  Ted Briscoe,et al.  Generalized Probabilistic LR Parsing of Natural Language (Corpora) with Unification-Based Grammars , 1993, CL.

[181]  Benoît Sagot,et al.  Using Lexicon-Grammar Tables for French Verbs in a Large-Coverage Parser , 2009, LTC.

[182]  Yoram Singer,et al.  Improved Boosting Algorithms Using Confidence-rated Predictions , 1998, COLT' 98.

[183]  Christopher D. Manning Part-of-Speech Tagging from 97% to 100%: Is It Time for Some Linguistics? , 2011, CICLing.

[184]  Richard Johansson,et al.  Extended Constituent-to-Dependency Conversion for English , 2007, NODALIDA.

[185]  John B. Lowe,et al.  The Berkeley FrameNet Project , 1998, ACL.

[186]  Benoît Sagot,et al.  SxPipe 2: architecture pour le traitement pré-syntaxique de corpus bruts , 2008 .

[187]  Andrew McCallum,et al.  Using Maximum Entropy for Text Classification , 1999 .

[188]  Sophia Ananiadou,et al.  Fast Full Parsing by Linear-Chain Conditional Random Fields , 2009, EACL.

[189]  Ivan A. Sag,et al.  Book Reviews: Head-driven Phrase Structure Grammar and German in Head-driven Phrase-structure Grammar , 1996, CL.

[190]  Michael Collins,et al.  Head-Driven Statistical Models for Natural Language Parsing , 2003, CL.

[191]  Christopher D. Manning,et al.  Efficient, Feature-based, Conditional Random Field Parsing , 2008, ACL.

[192]  Erhard W. Hinrichs,et al.  Is it Really that Difficult to Parse German? , 2006, EMNLP.

[193]  Jan Hajic,et al.  Semi-Supervised Training for the Averaged Perceptron POS Tagger , 2009, EACL.

[194]  Jun'ichi Tsujii,et al.  Syntax Annotation for the GENIA Corpus , 2005, IJCNLP.

[195]  Kenji Sagae Self-Training without Reranking for Parser Domain Adaptation and Its Impact on Semantic Role Labeling , 2010 .

[196]  Koby Crammer,et al.  Online Passive-Aggressive Algorithms , 2003, J. Mach. Learn. Res..

[197]  Adwait Ratnaparkhi,et al.  A Maximum Entropy Model for Part-Of-Speech Tagging , 1996, EMNLP.

[198]  Mark Steedman,et al.  Bootstrapping statistical parsers from small datasets , 2003, EACL.

[199]  Mitchell P. Marcus,et al.  Text Chunking using Transformation-Based Learning , 1995, VLC@ACL.

[200]  Matthieu Constant,et al.  Intégrer des connaissances linguistiques dans un CRF : application à l'apprentissage d'un segmenteur-étiqueteu r du français , 2011 .

[201]  Carlos Ramisch,et al.  mwetoolkit: a Framework for Multiword Expression Identification , 2010, LREC.

[202]  Josef van Genabith,et al.  QuestionBank: Creating a Corpus of Parse-Annotated Questions , 2006, ACL.

[203]  Tejaswini Deoskar,et al.  Re-estimation of Lexical Parameters for Treebank PCFGs , 2008, COLING.

[204]  Thierry Poibeau,et al.  LexSchem: a Large Subcategorization Lexicon for French Verbs , 2008, LREC.

[205]  Eugene Charniak,et al.  Self-Training for Biomedical Parsing , 2008, ACL.

[206]  Jian Su,et al.  A Phrase-Based Statistical Model for SMS Text Normalization , 2006, ACL.

[207]  Xavier Blanco,et al.  Les dictionnaires électroniques de l’espagnol (DELASs et DELACs) , 2000 .

[208]  Reut Tsarfaty,et al.  Integrated Morphological and Syntactic Disambiguation for Modern Hebrew , 2006, ACL.

[209]  Mary P. Harper,et al.  A Second-Order Hidden Markov Model for Part-of-Speech Tagging , 1999, ACL.

[210]  Jun'ichi Tsujii,et al.  Probabilistic CFG with Latent Annotations , 2005, ACL.

[211]  Pascal Denis,et al.  Coupling an Annotated Corpus and a Morphosyntactic Lexicon for State-of-the-Art POS Tagging with Less Human Effort , 2009, PACLIC.

[212]  Bruce T. Lowerre,et al.  The HARPY speech recognition system , 1976 .

[213]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[214]  Martha Palmer,et al.  Class-Based Construction of a Verb Lexicon , 2000, AAAI/IAAI.

[215]  Ozan Arkan Can,et al.  Multiword Expressions in Statistical Dependency Parsing , 2011, SPMRL@IWPT.

[216]  Elsa Tolone Analyse syntaxique à l’aide des tables du Lexique-Grammaire du français , 2012 .

[217]  Xavier Carreras,et al.  Simple Semi-supervised Dependency Parsing , 2008, ACL.

[218]  Brian Roark,et al.  Probabilistic Top-Down Parsing and Language Modeling , 2001, CL.

[219]  Ralph Grishman,et al.  Comlex Syntax: Building a Computational Lexicon , 1994, COLING.

[220]  Jun'ichi Tsujii,et al.  Shift-Reduce Dependency DAG Parsing , 2008, COLING.

[221]  Jason Eisner,et al.  Three New Probabilistic Models for Dependency Parsing: An Exploration , 1996, COLING.

[222]  Hwee Tou Ng,et al.  A maximum entropy approach to information extraction from semi-structured and free text , 2002, AAAI/IAAI.

[223]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[224]  Eneko Agirre,et al.  Improving Dependency Parsing with Semantic Classes , 2011, ACL.

[225]  Keith Hall,et al.  K-best Spanning Tree Parsing , 2007, ACL.

[226]  Alexis Nasr,et al.  MACAON : Une chaîne linguistique pour le traitement de graphes de mots , 2009 .

[227]  Joakim Nivre,et al.  MaltParser: A Data-Driven Parser-Generator for Dependency Parsing , 2006, LREC.

[228]  Helmut Schmidt,et al.  Probabilistic part-of-speech tagging using decision trees , 1994 .

[229]  Sanda M. Harabagiu,et al.  Using Predicate-Argument Structures for Information Extraction , 2003, ACL.

[230]  L MercerRobert,et al.  Class-based n-gram models of natural language , 1992 .

[231]  Kenneth Ward Church A Stochastic Parts Program and Noun Phrase Parser for Unrestricted Text , 1988, ANLP.

[232]  Stephen Cox,et al.  Some statistical issues in the comparison of speech recognition algorithms , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[233]  Eugene Charniak,et al.  Automatic Domain Adaptation for Parsing , 2010, NAACL.

[234]  Jennifer Foster "cba to check the spelling": Investigating Parser Performance on Discussion Forum Posts , 2010, HLT-NAACL.

[235]  Ted Briscoe,et al.  Relational evaluation schemes , 2002 .

[236]  Matthieu Constant,et al.  Integration of Data from a Syntactic Lexicon into Generative and Discriminative Probabilistic Parsers , 2011, RANLP.

[237]  Adwait Ratnaparkhi,et al.  A Maximum Entropy Model for Prepositional Phrase Attachment , 1994, HLT.

[238]  Joakim Nivre,et al.  Benchmarking of Statistical Dependency Parsers for French , 2010, COLING.

[239]  Yihong Gong,et al.  Multi-labelled classification using maximum entropy method , 2005, SIGIR '05.

[240]  Adam L. Berger,et al.  A Maximum Entropy Approach to Natural Language Processing , 1996, CL.