Mettre les expressions multi-mots au coeur de l'analyse automatique de textes : sur l'exploitation de ressources symboliques externes

Dans ce memoire, nous nous attachons a retracer les differents travaux de recherche que nous avons menes depuis plus de 10 ans. L'un de nos objectifs principaux a ete d'ameliorer la finesse linguistique de differentes tâches du TAL en prenant en compte les expressions multi-mots. En particulier, notre idee directrice a ete d'exploiter des ressources lexicales riches et de les coupler a differents modeles probabilistes ou differentes procedures hybrides. Nos travaux peuvent se decouper en trois axes. Le premier axe porte sur l'etiquetage morphosyntaxique et l'analyse syntaxique. L'integration de la reconnaissance des expressions multi-mots dans telles tâches a essentiellement consiste a adapter divers modeles probabilistes dedies a ces tâches. Comme ces expressions sont, par definition, difficilement predictibles, l'exploitation de ressources lexicales est primordiale pour leur reconnaissance. Nous avons donc ete amene a trouver des strategies d'integration de ressources symboliques externes dans nos modeles. Le deuxieme axe consiste a integrer la reconnaissance d'expressions multi-mots dans des applications. Nous avons, en particulier, developpe des applications liees au monde prive (extraction d'informations, classification) ou liees au monde academique (aide a la construction de lexiques bilingues ou a des etudes linguistiques). Dans tous les cas, nous nous sommes base sur des pretraitements fins alimentes par des ressources lexicales riches. Le troisieme axe concerne la construction de ressources linguistiques. En effet, le developpement des outils decrits ci-dessus n'est possible que grâce a l'existence de ressources (corpus annotes ou lexiques). Or, les ressources autour des expressions multi-mots manquent cruellement ou sont incompletes. Pour toutes les ressources developpees, nous avons mene des etudes linguistiques fines et systematiques. Nous avons egalement mis en place un outillage informatique pour les gerer et les appliquer a des textes.

[1]  Philip S. Yu,et al.  On using partial supervision for text categorization , 2004, IEEE Transactions on Knowledge and Data Engineering.

[2]  Nicole Grégoire,et al.  DuELME: a Dutch electronic lexicon of multiword expressions , 2010, Lang. Resour. Evaluation.

[3]  Patrick Watrin,et al.  An N-gram Frequency Database Reference to Handle MWE Extraction in NLP Applications , 2011, MWE@ACL.

[4]  Yorick Wilks,et al.  Making Sense About Sense , 2007 .

[5]  Josef van Genabith,et al.  Handling Unknown Words in Statistical Latent-Variable Parsing Models for Arabic, English and French , 2010, SPMRL@NAACL-HLT.

[6]  Emmanuel Morin,et al.  Compositionality and lexical alignment of multi-word terms , 2010, Lang. Resour. Evaluation.

[7]  Lluís Màrquez i Villodre,et al.  SVMTool: A general POS Tagger Generator Based on Support Vector Machines , 2004, LREC.

[8]  Joseph Le Roux,et al.  Semi-supervised Dependency Parsing using Lexical Affinities , 2012, ACL.

[9]  Joakim Nivre,et al.  Multiword Units in Syntactic Parsing , 2004 .

[10]  Alexis Nasr,et al.  Modèles génératif et discriminant en analyse syntaxique : expériences sur le corpus arboré de Paris 7 (Generative and discriminative models in parsing: experiments on the Paris 7 Treebank) , 2011, JEPTALNRECITAL.

[11]  Slav Petrov,et al.  Products of Random Latent Variable Grammars , 2010, NAACL.

[12]  Lauri Karttunen Applications of Finite-State Transducers in Natural Language Processing , 2000, CIAA.

[13]  Jean Véronis,et al.  Étiquetage grammatical des corpus de parole : problèmes et perspectives , 1999 .

[14]  B. Daille Repérage et extraction de terminologie par une approche mixte statistique et linguistique , 1995 .

[15]  Ralph Grishman,et al.  Message Understanding Conference- 6: A Brief History , 1996, COLING.

[16]  Joakim Nivre,et al.  Tagging a Corpus of Spoken Swedish , 2001 .

[17]  Taher H. Haveliwala Topic-Sensitive PageRank: A Context-Sensitive Ranking Algorithm for Web Search , 2003, IEEE Trans. Knowl. Data Eng..

[18]  Kathleen R. McKeown,et al.  A description of the CIDR system as used for TDT-2 , 1999 .

[19]  Matthieu Constant,et al.  Methods for Constructing Lexicon-Grammar Resources: The Example of Measure Expressions , 2002, LREC.

[20]  Sébastien Paumier,et al.  De la reconnaissance des formes linguistiques à l'analyse syntaxique , 2003 .

[21]  Max Silberztein,et al.  Finite-State Description of the French Determiner system , 2003, Journal of French Language Studies.

[22]  Matthieu Constant,et al.  Outilex, a Linguistic Platform for Text Processing , 2006, ACL.

[23]  Steven Abney,et al.  Parsing By Chunks , 1991 .

[24]  Patrick Watrin,et al.  La reconnaissance des mots composés à l'épreuve de l'analyse syntaxique et vice-versa : évaluation de deux stratégies discriminantes , 2012 .

[25]  Eric Brill,et al.  Transformation-Based Error-Driven Learning and Natural Language Processing: A Case Study in Part-of-Speech Tagging , 1995, CL.

[26]  Patrick Watrin,et al.  Accounting for Contiguous Multiword Expressions in Shallow Parsing , 2013, Prague Bull. Math. Linguistics.

[27]  Benoît Sagot,et al.  Analyse syntaxique profonde à grande échelle: SxLFG , 2005 .

[28]  Erik-Jan van der Linden,et al.  The Flexibility of French Idioms: A Representation with Lexicalized Tree Adjoining Grammar , 2014 .

[29]  Fernando Pereira,et al.  Shallow Parsing with Conditional Random Fields , 2003, NAACL.

[30]  Mehryar Mohri,et al.  Finite-State Transducers in Language and Speech Processing , 1997, CL.

[31]  Max Silberztein INTEX: A Corpus Processing System , 1994, COLING.

[32]  François Yvon,et al.  Practical Very Large Scale CRFs , 2010, ACL.

[33]  Matthieu Constant,et al.  Integration of Data from a Syntactic Lexicon into Generative and Discriminative Probabilistic Parsers , 2011, RANLP.

[34]  Ralph Grishman,et al.  NOMLEX: a lexicon of nominalizations , 1998 .

[35]  A. Dister De la transcription à l'étiquetage morphosyntaxique : le cas de la banque de données textuelles orales Valibel , 2007 .

[36]  M. Gross The Construction of Local Grammars , 1997 .

[37]  Matthieu Constant,et al.  Grammaires locales pour l'analyse automatique de textes : méthodes de construction et outils de gestion. (Local grammars for text parsing: construction methods and management tools) , 2003 .

[38]  Andreas Stolcke,et al.  Enriching speech recognition with automatic detection of sentence boundaries and disfluencies , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[39]  Aravind K. Joshi,et al.  Tree Adjunct Grammars , 1975, J. Comput. Syst. Sci..

[40]  J. Giry-Schneider,et al.  Les nominalisations en français : l'opérateur "faire" dans le lexique , 1979 .

[41]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[42]  Jean-Pierre Chanod,et al.  Incremental Finite-State Parsing , 1997, ANLP.

[43]  András Kornai Extended finite state models of language , 1996, Nat. Lang. Eng..

[44]  Vasileios Hatzivassiloglou,et al.  Translating Collocations for Bilingual Lexicons: A Statistical Approach , 1996, CL.

[45]  Aravind K. Joshi,et al.  Tree-Rewriting Models of Multi-Word Expressions , 2011, MWE@ACL.

[46]  Dan Klein,et al.  Web-Scale Features for Full-Scale Parsing , 2011, ACL.

[47]  John A. Carroll,et al.  The Automatic Acquisition of Verb Subcategorisations and Their Impact on the Performance of an HPSG Parser , 2004, IJCNLP.

[48]  Pavel Pecina,et al.  Lexical association measures and collocation extraction , 2009, Lang. Resour. Evaluation.

[49]  Keh-Jiann Chen,et al.  Acquiring Translation Equivalences of Multiword Expressions by Normalized Correlation Frequencies , 2009, EMNLP.

[50]  Extension de la notion de verbe support , 2009 .

[51]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[52]  Patrick Watrin,et al.  Partial Parsing of Spontaneous Spoken French , 2010, LREC.

[53]  Tanja Samardžić,et al.  Cross-Lingual Variation of Light Verb Constructions: Using Parallel Corpora and Automatic Alignment for Linguistic Research , 2010 .

[54]  Isabelle Tellier,et al.  Evaluating the Impact of External Lexical Resources into a CRF-based Multiword Segmenter and Part-of-Speech Tagger , 2012, LREC.

[55]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[56]  Sylvain Kahane,et al.  Word Order in German: A Formal Dependency Grammar Using a Topological Hierarchy , 2001, ACL.

[57]  Jonas Kuhn,et al.  Exploiting Translational Correspondences for Pattern-Independent MWE Identification , 2009, MWE@IJCNLP.

[58]  William A. Woods,et al.  Computational Linguistics Transition Network Grammars for Natural Language Analysis , 2022 .

[59]  Anabela Barreiro,et al.  Portuguese Large-scale Language Resources for NLP Applications , 2004, LREC.

[60]  Alexis Nasr,et al.  MACAON : Une chaîne linguistique pour le traitement de graphes de mots , 2009 .

[61]  Joakim Nivre,et al.  MaltParser: A Data-Driven Parser-Generator for Dependency Parsing , 2006, LREC.

[62]  Ming Zhou,et al.  Collocation Translation Acquisition Using Monolingual Corpora , 2004, ACL.

[63]  Helmut Schmidt,et al.  Probabilistic part-of-speech tagging using decision trees , 1994 .

[64]  Ronald M. Kaplan,et al.  Lexical Functional Grammar A Formal System for Grammatical Representation , 2004 .

[65]  Matthieu Constant,et al.  Intégrer des connaissances linguistiques dans un CRF : application à l'apprentissage d'un segmenteur-étiqueteu r du français , 2011 .

[66]  Matthieu Constant,et al.  A generic tool to generate a lexicon for NLP from Lexicon-Grammar tables , 2010, ArXiv.

[67]  Carlos Ramisch,et al.  mwetoolkit: a Framework for Multiword Expression Identification , 2010, LREC.

[68]  Eric Wehrli,et al.  Collocation translation based on sentence alignment and parsing , 2007, JEPTALNRECITAL.

[69]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[70]  A. Dister,et al.  Les disfluences dans les mots composés , 2012 .

[71]  Piet Mertens,et al.  La valence: l'approche pronominale et son application au lexique verbal , 2003 .

[72]  Atro Voutilainen,et al.  A language-independent system for parsing unrestricted text , 1995 .

[73]  Eric Laporte,et al.  A French Corpus Annotated for Multiword Nouns , 2008, LREC 2008.

[74]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[75]  Alexandra Kinyon,et al.  Building a Treebank for French , 2000, LREC.

[76]  D. Maurel Building automaton on schemata and acceptability tables : Application to French date adverbials , 1997 .

[77]  Dragomir R. Radev,et al.  Book Review: Graph-Based Natural Language Processing and Information Retrieval by Rada Mihalcea and Dragomir Radev , 2011, CL.

[78]  Liang Huang,et al.  Forest Reranking: Discriminative Parsing with Non-Local Features , 2008, ACL.

[79]  Dan Klein,et al.  Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network , 2003, NAACL.

[80]  Matthieu Constant On the analysis of locative prepositional phrases : the classifier/proper noun pairing , 2002 .

[81]  Anthony Sigogne HybridTagger : un étiqueteur hybride pour le Français , 2010 .

[82]  Xavier Carreras,et al.  An Empirical Study of Semi-supervised Structured Conditional Models for Dependency Parsing , 2009, EMNLP.

[83]  Preslav Nakov Noun Compound Interpretation Using Paraphrasing Verbs: Feasibility Study , 2008, AIMSA.

[84]  Tejaswini Deoskar,et al.  Re-estimation of Lexical Parameters for Treebank PCFGs , 2008, COLING.

[85]  Timothy Baldwin,et al.  Multiword expressions: linguistic precision and reusability , 2002, LREC.

[86]  Patrick Watrin,et al.  Discriminative Strategies to Integrate Multiword Expression Recognition and Parsing , 2012, ACL.

[87]  Ted Dunning,et al.  Accurate Methods for the Statistics of Surprise and Coincidence , 1993, CL.

[88]  Matthieu Constant,et al.  Outilex, plate-forme logicielle de traitement de textes écrits , 2007, JEPTALNRECITAL.

[89]  Thierry Poibeau,et al.  LexSchem: a Large Subcategorization Lexicon for French Verbs , 2008, LREC.

[90]  John Bear,et al.  A System for Labeling Self-Repairs in Speech , 1993 .

[91]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[92]  Mary P. Harper,et al.  A Second-Order Hidden Markov Model for Part-of-Speech Tagging , 1999, ACL.

[93]  Patrick Paroubek,et al.  A disfluency study for cleaning spontaneous speech automatic transcripts and improving speech language models , 2003, DiSS.

[94]  Jun'ichi Tsujii,et al.  Probabilistic CFG with Latent Annotations , 2005, ACL.

[95]  Pascal Denis,et al.  Coupling an Annotated Corpus and a Morphosyntactic Lexicon for State-of-the-Art POS Tagging with Less Human Effort , 2009, PACLIC.

[96]  Patrick Paroubek,et al.  Les résultats de la campagne EASY d'évaluation des analyseurs syntaxiques du français , 2007 .

[98]  Ioannis Korkontzelos,et al.  Can Recognising Multiword Expressions Improve Shallow Parsing? , 2010, HLT-NAACL.

[99]  C. Blanche-Benveniste,et al.  Le français parlé : études grammaticales , 1990 .

[100]  Philipp Koehn,et al.  Europarl: A Parallel Corpus for Statistical Machine Translation , 2005, MTSUMMIT.

[101]  Christophe Benzitoun,et al.  L'analyse syntaxique de l'oral : problèmes et méthodes , 2004 .

[102]  Jean-Yves Antoine,et al.  Automatic Rich Annotation of Large Corpus of Conversational transcribed speech: the Chunking Task of the EPAC Project , 2008, LREC.

[103]  Eric Laporte,et al.  Elimination of lexical ambiguities by grammars: The ELAG system , 2000 .

[104]  Dan Roth,et al.  Learning English Light Verb Constructions: Contextual or Statistical , 2011, MWE@ACL.

[105]  Maurice Gross,et al.  Les limites de la phrase figée , 1988 .

[106]  Veronika Vincze,et al.  Detecting Noun Compounds and Light Verb Constructions: a Contrastive Study , 2011, MWE@ACL.

[107]  Eugene Charniak,et al.  A Maximum-Entropy-Inspired Parser , 2000, ANLP.

[108]  Dragomir R. Radev,et al.  LexRank: Graph-based Lexical Centrality as Salience in Text Summarization , 2004, J. Artif. Intell. Res..

[109]  Matthieu Constant,et al.  Real-time unsupervised classification of web documents , 2009, 2009 International Multiconference on Computer Science and Information Technology.

[110]  Kadri Muischnek,et al.  Multi-Word Verbs of Estonian : a Database and a Corpus , 2008 .

[111]  Benoît Sagot,et al.  French frozen verbal expressions: from lexicon-grammar tables to NLP applications. , 2006 .

[112]  Thorsten Brants,et al.  TnT – A Statistical Part-of-Speech Tagger , 2000, ANLP.

[113]  Eric Wehrli,et al.  Sentence Analysis and Collocation Identification , 2010, MWE@COLING.

[114]  Timothy Baldwin,et al.  Multiword Expressions: A Pain in the Neck for NLP , 2002, CICLing.

[115]  B. Courtois,et al.  Un système de dictionnaires électroniques pour les mots simples du français , 1990 .

[116]  Claudia Soria,et al.  Lexical Markup Framework (LMF) , 2006, LREC.

[117]  Eugene Charniak,et al.  Statistical Parsing with a Context-Free Grammar and Word Statistics , 1997, AAAI/IAAI.

[118]  Matthieu Constant,et al.  French parsing enhanced with a word clustering method based on a syntactic lexicon , 2011, SPMRL@IWPT.

[119]  C. Blanche-Benveniste,et al.  Le français parlé : transcription et édition , 1989 .

[120]  Veronika Vincze,et al.  Multiword Expressions and Named Entities in the Wiki50 Corpus , 2011, RANLP.

[121]  Eric Wehrli,et al.  Extraction of multi-word collocations using syntactic bigram composition , 2003 .

[122]  Eric Wehrli,et al.  Collocations in a Rule-Based MT System: A Case Study Evaluation of their Translation Adequacy , 2009, EAMT.

[123]  Marie Candito,et al.  Cross parser evaluation and tagset variation: a French treebank study , 2009 .

[124]  Beth Levin,et al.  English Verb Classes and Alternations: A Preliminary Investigation , 1993 .

[125]  Denis Maurel,et al.  Compiling Linguistic Constraints into Finite State Automata , 2006, CIAA.

[126]  John B. Lowe,et al.  The Berkeley FrameNet Project , 1998, ACL.

[127]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[128]  Sophia Ananiadou,et al.  Fast Full Parsing by Linear-Chain Conditional Random Fields , 2009, EACL.

[129]  Ivan A. Sag,et al.  Book Reviews: Head-driven Phrase Structure Grammar and German in Head-driven Phrase-structure Grammar , 1996, CL.

[130]  Michael Collins,et al.  Head-Driven Statistical Models for Natural Language Parsing , 2003, CL.

[132]  Joakim Nivre,et al.  An Efficient Algorithm for Projective Dependency Parsing , 2003, IWPT.

[133]  Timothy Baldwin,et al.  Improving Parsing and PP Attachment Performance with Sense Information , 2008, ACL.

[134]  Marie Candito,et al.  Improving generative statistical parsing with semi-supervised word clustering , 2009, IWPT.

[135]  Christopher D. Manning,et al.  Joint Parsing and Named Entity Recognition , 2009, NAACL.

[136]  Mark Johnson,et al.  Parsing the Wall Street Journal using a Lexical-Functional Grammar and Discriminative Estimation Techniques , 2002, ACL.

[137]  Denis Maurel,et al.  The Prolex Data Base: Toponyms and Gentiles for NLP , 1998 .

[138]  Dan Klein,et al.  Learning Accurate, Compact, and Interpretable Tree Annotation , 2006, ACL.

[139]  Michel Simard,et al.  Translation Spotting for Translation Memories , 2003, ParallelTexts@NAACL-HLT.

[140]  Christopher R. Johnson,et al.  Background to Framenet , 2003 .

[141]  Yuji Matsumoto,et al.  Chunking with Support Vector Machines , 2001, NAACL.

[142]  Carlos Ramisch,et al.  Alignment-based extraction of multiword expressions , 2010, Lang. Resour. Evaluation.

[143]  SmadjaFrank Retrieving collocations from text , 1993 .

[144]  Matthieu Constant,et al.  Using subcategorization frames to improve French probabilistic parsing , 2012, KONVENS.

[145]  Marie Candito,et al.  Expériences d’analyse syntaxique statistique du français , 2008, JEPTALNRECITAL.

[146]  Christopher D. Manning,et al.  Multiword Expression Identification with Tree Substitution Grammars: A Parsing tour de force with French , 2011, EMNLP.

[147]  Timothy Baldwin,et al.  Interpreting Noun Compounds using Bootstrapping and Sense Collocation , 2007 .

[148]  Julien Bourdaillet,et al.  TransSearch: from a bilingual concordancer to a translation finder , 2010, Machine Translation.

[149]  Martha Palmer,et al.  Class-Based Construction of a Verb Lexicon , 2000, AAAI/IAAI.

[150]  Ozan Arkan Can,et al.  Multiword Expressions in Statistical Dependency Parsing , 2011, SPMRL@IWPT.

[151]  Elsa Tolone Analyse syntaxique à l’aide des tables du Lexique-Grammaire du français , 2012 .

[152]  Ralph Grishman,et al.  Comlex Syntax: Building a Computational Lexicon , 1994, COLING.

[153]  Nelleke Oostdijk Normalizations and Disfluencies in Spoken Language Data , 2003 .

[154]  Isabelle Tellier,et al.  The Crotal SRL System : a Generic Tool Based on Tree-structured CRF , 2009, CoNLL Shared Task.

[155]  Ralph Debusmann Multiword Expressions as Dependency Subgraphs , 2004 .

[156]  Alexis Nasr,et al.  Integrating a POS Tagger and a Chunker Implemented as Weighted Finite State Machines , 2005, FSMNLP.

[157]  Gaël Dias,et al.  Multiword Unit Hybrid Extraction , 2003, ACL 2003.

[158]  Ralph Debusmann,et al.  Topological Dependency Trees: A Constraint-Based Account of Linear Precedence , 2001, ACL.

[159]  Nadine Vigouroux,et al.  Traitement automatique de disfluences dans un corpus linguistiquement contraint , 2009 .

[160]  W. Bennett,et al.  Les Predicats nominaux du francais: Les phrases simples a verbe support , 1989 .

[161]  Caroline Brun Terminology Finite-State Preprocessing for Computational LFG , 1998, COLING-ACL.

[162]  Isabelle Tellier,et al.  Etiqueter un corpus oral par apprentissage automatique à l'aide de connaissances linguistiques , 2010, ArXiv.

[163]  Maurice Gross,et al.  Une classification des phrases « figées » du français , 1982 .

[164]  Adwait Ratnaparkhi,et al.  A Maximum Entropy Model for Part-Of-Speech Tagging , 1996, EMNLP.

[165]  Mitchell P. Marcus,et al.  Text Chunking using Transformation-Based Learning , 1995, VLC@ACL.

[166]  M. Gross Les bases empiriques de la notion de predicat semantique (The Empirical Foundations of the Notion of Semantic Predicate). , 1981 .

[167]  Igor Mel’čuk,et al.  Dictionnaire explicatif et combinatoire du francais contemporain. Recherches lexico-semantiques , 1985 .

[168]  Yves Schabes,et al.  Deterministic Part-of-Speech Tagging with Finite-State Transducers , 1995, Comput. Linguistics.

[169]  Richard M. Schwartz,et al.  A Lexically-Driven Algorithm for Disfluency Detection , 2004, NAACL.

[170]  Joseph Le Roux,et al.  Combining compound recognition and PCFG-LA parsing with word lattices and conditional random fields , 2013, TSLP.

[171]  Patrick Watrin Une approche hybride de l'extraction d'information : sous-langages et lexique-grammaire/ , 2006 .

[172]  Patrick Watrin,et al.  Networking Multiword Units , 2008, GoTAL.

[173]  Josef van Genabith,et al.  Decreasing Lexical Data Sparsity in Statistical Syntactic Parsing - Experiments with Named Entities , 2011, MWE@ACL.

[174]  Michael Collins,et al.  Discriminative Reranking for Natural Language Parsing , 2000, CL.

[175]  Maurice Gross,et al.  Lexicon - Grammar The Representation of Compound Words , 1986, COLING.

[176]  Yuji Matsumoto,et al.  Statistical Dependency Analysis with Support Vector Machines , 2003, IWPT.

[177]  Carlos Ramisch,et al.  Web-based and combined language models: a case study on noun compound identification , 2010, COLING.

[178]  Dan Klein,et al.  Accurate Unlexicalized Parsing , 2003, ACL.

[179]  Max Silberztein Complex Annotations with NooJ , 2010 .

[180]  Vito Pirrelli,et al.  SHALLOW PARSING AND TEXT CHUNKING: A VIEW ON UNDERSPECIFICATION IN SYNTAX , 2002 .

[181]  Agnès Tutin,et al.  Collocations régulières et irrégulières : esquisse de typologie du phénomène collocatif , 2002 .

[182]  Frank Keller,et al.  Lexicalization in Crosslinguistic Probabilistic Parsing: The Case of French , 2005, ACL.

[183]  Mathieu Constant,et al.  Automatic detection of disfluencies in speech transcriptions , 2010 .

[184]  Gaston Gross,et al.  Les expressions figées en français : noms composés et autres locutions , 1996 .

[185]  Satoshi Sekine,et al.  Extended Named Entity Hierarchy , 2002, LREC.

[186]  Koby Crammer,et al.  Online Large-Margin Training of Dependency Parsers , 2005, ACL.

[187]  Gérald Purnelle,et al.  Normalizing speech transcriptions for Natural Language Processing , 2009 .

[188]  F. Grossmann,et al.  Les collocations: analyse et traitement. , 2003 .

[189]  Emmanuel Roche Transducer parsing of free and frozen sentences , 1996, Nat. Lang. Eng..

[190]  I. Mel'cuk Phrasémes dans le dictionnaire , 2011 .

[191]  Max Silberztein,et al.  INTEX: An FST Toolbox , 2000, Theor. Comput. Sci..

[192]  Eric Laporte,et al.  Extension of a Grammar of French Determiners , 2007 .

[193]  Annibale Elia,et al.  Lexicon-Grammar, Electronic Dictionaries and Local Grammars of Italian , 2004 .

[194]  Emiel Krahmer,et al.  Memory-based disfluency chunking , 2003, DiSS.

[195]  Anthony Sigogne,et al.  Intégration de ressources lexicales riches dans un analyseur syntaxique probabiliste. (Integration of lexical resources in a probabilistic parser) , 2012 .

[196]  Matthieu Constant,et al.  MWU-Aware Part-of-Speech Tagging with a CRF Model and Lexical Resources , 2011, MWE@ACL.

[197]  Eugene Charniak,et al.  Coarse-to-Fine n-Best Parsing and MaxEnt Discriminative Reranking , 2005, ACL.

[198]  Pierre Zweigenbaum,et al.  Translating medical terminologies through word alignment in parallel text corpora , 2009, J. Biomed. Informatics.

[199]  F. Hausmann,et al.  Un dictionnaire des collocations est-il possible? , 1979 .

[200]  Pascal Denis,et al.  Statistical French Dependency Parsing: Treebank Conversion and First Results , 2010, LREC.

[201]  Steven P. Abney Partial parsing via finite-state cascades , 1996, Natural Language Engineering.