Informative quality estimation of machine translation output

Despite the recent advances in the field of machine translation (MT), today, MT systems cannot guarantee that the sentences they produce will be fluent and coherent in both syntax and semantics. With post-editing of MT output becoming a common practice in fast-paced Computer-Assisted Translation (CAT) workflows, research on Quality Estimation (QE) has thrived in recent years. Despite the link between MT errors and the cognitive effort involved in correcting them, current QE studies often focus on finding informative features that capture the monolingual and bilingual properties of given source/MT output pairs and estimate overall post-editing effort at word, sentence or document level without making a distinction between MT error types. In this thesis, we present a comprehensive approach to automatic error detection as a basis for understanding the relationship between different types of MT errors and the corresponding post-editing effort and take a first step towards informative quality estimation systems of machine translation, which are able to justify the basis for estimated quality. In order to study the relationship between MT errors and post-editing effort on a large scale, we developed an error taxonomy and a corpus of MT errors originating from statistical (SMT), rule-based and neural machine translation systems for English-Dutch and obtained post-edited versions of the MT output of this corpus. The error taxonomy is grounded in translation quality assessment literature and allows for an MT-specific, fine-grained error annotation based on the main distinction between accuracy and fluency errors. Moreover, the hier-

[1]  P. Pudil,et al.  of Techniques for Large-Scale Feature Selection , 1994 .

[2]  Gerold Schneider,et al.  Exploiting Synergies Between Open Resources for German Dependency Parsing, POS-tagging, and Morphological Analysis , 2013, RANLP.

[3]  Paul Geladi,et al.  Principal Component Analysis , 1987, Comprehensive Chemometrics.

[4]  Sharon O'Brien,et al.  Analysing Post-Editing Performance: Correlations with Years of Translation Experience , 2010, EAMT.

[5]  Jörg Tiedemann,et al.  Parallel Data, Tools and Interfaces in OPUS , 2012, LREC.

[6]  Kenneth Heafield,et al.  KenLM: Faster and Smaller Language Model Queries , 2011, WMT@EMNLP.

[7]  Marcello Federico,et al.  Domain Adaptation for Statistical Machine Translation with Monolingual Resources , 2009, WMT@EACL.

[8]  Andreas Eisele,et al.  DGT-TM: A freely available Translation Memory in 22 languages , 2012, LREC.

[9]  Robert J. Hartsuiker,et al.  The impact of machine translation error types on post-editing effort indicators , 2015, MTSUMMIT.

[10]  Yoshua Bengio,et al.  Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.

[11]  Matteo Negri,et al.  FBK-UPV-UEdin participation in the WMT14 Quality Estimation shared-task , 2014, WMT@ACL.

[12]  Michael Collins,et al.  Convolution Kernels for Natural Language , 2001, NIPS.

[13]  Lucia Specia,et al.  QuEst - A translation quality estimation framework , 2013, ACL.

[14]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[15]  José B. Mariño,et al.  Overcoming statistical machine translation limitations: error analysis and proposed solutions for the Catalan–Spanish language pair , 2011, Lang. Resour. Evaluation.

[16]  Maureen Caudill,et al.  Neural networks primer, part III , 1988 .

[17]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[18]  D Zipser,et al.  Learning the hidden structure of speech. , 1988, The Journal of the Acoustical Society of America.

[19]  Sharon O'Brien,et al.  Correlations of perceived post-editing effort with measurements of actual effort , 2015, Machine Translation.

[20]  Declan Groves,et al.  Identification and Analysis of Post-Editing Patterns for MT , 2009, MTSUMMIT.

[21]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[22]  Lucia Specia,et al.  WMT17 Quality Estimation Shared Task Training and Development Data , 2016 .

[23]  Sara Stymne,et al.  Using a Grammar Checker for Evaluation and Postprocessing of Statistical Machine Translation , 2010, LREC.

[24]  Alina Secar Translation Evaluation-a State of the Art Survey , 2006 .

[25]  M. Sasikumar,et al.  Translation Quality Estimation using Recurrent Neural Network , 2016, WMT.

[26]  Gregory M. Shreve,et al.  Average Pause Ratio as an Indicator of Cognitive Effort in Post-Editing: A Case Study , 2012, AMTA.

[27]  Hermann Ney,et al.  Word-Level Confidence Estimation for Machine Translation , 2007, CL.

[28]  Sharon O'Brien,et al.  Pauses as Indicators of Cognitive Effort in Post-editing Machine Translation Output , 2006 .

[29]  M A Just,et al.  A theory of reading: from eye fixations to comprehension. , 1980, Psychological review.

[30]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[31]  Lucia Specia,et al.  Multi-level Translation Quality Prediction with QuEst++ , 2015, ACL.

[32]  Aljoscha Burchardt,et al.  Assessing Inter-Annotator Agreement for Translation Error Annotation , 2014 .

[33]  Sharon O’Brien,et al.  Can MT Output Be Evaluated Through Eye Tracking? , 2009, MTSUMMIT.

[34]  Ondrej Bojar,et al.  Automatic MT Error Analysis: Hjerson Helping Addicter , 2012, LREC.

[35]  Yvette Graham,et al.  Improving Evaluation of Machine Translation Quality Estimation , 2015, ACL.

[36]  Yifan He,et al.  Bridging SMT and TM with Translation Recommendation , 2010, ACL.

[37]  Kamel Smaïli,et al.  “This sentence is wrong.” Detecting errors in machine-translated sentences , 2011, Machine Translation.

[38]  Aljoscha Burchardt,et al.  From Human to Automatic Error Classification for Machine Translation Output , 2011, EAMT.

[39]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[40]  Jeffrey Pennington,et al.  Semi-Supervised Recursive Autoencoders for Predicting Sentiment Distributions , 2011, EMNLP.

[41]  Eduard H. Hovy,et al.  Neural Probabilistic Model for Non-projective MST Parsing , 2017, IJCNLP.

[42]  Lucia Specia,et al.  Estimating Machine Translation Post-Editing Effort with HTER , 2010, JEC.

[43]  Gavin C. Cawley,et al.  On Over-fitting in Model Selection and Subsequent Selection Bias in Performance Evaluation , 2010, J. Mach. Learn. Res..

[44]  Christopher D. Manning,et al.  Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling , 2005, ACL.

[45]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[46]  David Yarowsky,et al.  Minimally Supervised Morphological Analysis by Multimodal Alignment , 2000, ACL.

[47]  Josef Kittler,et al.  Floating search methods in feature selection , 1994, Pattern Recognit. Lett..

[48]  Lucia Specia,et al.  Sub-sentence Level Analysis of Machine Translation Post-editing Effort , 2014 .

[49]  John S. White Approaches to black box MT evaluation , 1995, MTSUMMIT.

[50]  Gertjan van Noord,et al.  At Last Parsing Is Now Operational , 2006, JEPTALNRECITAL.

[51]  Hermann Ney,et al.  LSTM, GRU, Highway and a Bit of Attention: An Empirical Overview for Language Modeling in Speech Recognition , 2016, INTERSPEECH.

[52]  Petr Sojka,et al.  Software Framework for Topic Modelling with Large Corpora , 2010 .

[53]  Lene Offersgaard,et al.  Domain specific MT in use , 2008, EAMT.

[54]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[55]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[56]  H. Ney,et al.  Domain dependent statistical machine translation , 2007, MTSUMMIT.

[57]  Sharon O'Brien,et al.  Methodologies for Measuring the Correlations between Post-Editing Effort and Machine Translatability , 2005, Machine Translation.

[58]  Orphée De Clercq,et al.  Dutch Parallel Corpus: A Balanced Copyright-Cleared Parallel Corpus , 2011 .

[59]  M. Asadullah,et al.  Error Detection for Post-editing Rule-based Machine Translation , 2012, AMTA.

[60]  Timothy Dozat,et al.  Deep Biaffine Attention for Neural Dependency Parsing , 2016, ICLR.

[61]  Marcello Federico,et al.  Assessing the Impact of Translation Errors on Machine Translation Quality with Mixed-effects Models , 2014, EMNLP.

[62]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[63]  Hans Uszkoreit,et al.  The taraXÜ corpus of human-annotated machine translations , 2014, LREC.

[64]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[65]  Kathleen McKeown,et al.  MT Error Detection for Cross-Lingual Question Answering , 2010, COLING.

[66]  Philipp Koehn,et al.  Findings of the 2017 Conference on Machine Translation (WMT17) , 2017, WMT.

[67]  Joke Daems,et al.  A translation robot for each translator? A comparative study of manual translation and post-editing of machine translations: process, quality and translator attitude , 2016 .

[68]  Noah A. Smith,et al.  A Simple, Fast, and Effective Reparameterization of IBM Model 2 , 2013, NAACL.

[69]  Radu Soricut,et al.  The SDL Language Weaver Systems in the WMT12 Quality Estimation Shared Task , 2012, WMT@NAACL-HLT.

[70]  Andy Way,et al.  Referential Translation Machines for Predicting Translation Quality and Related Statistics , 2015, WMT@EMNLP.

[71]  Christopher D. Manning,et al.  Generating Typed Dependency Parses from Phrase Structure Parses , 2006, LREC.

[72]  Vladimir Cherkassky,et al.  The Nature Of Statistical Learning Theory , 1997, IEEE Trans. Neural Networks.

[73]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[74]  Lucia Specia,et al.  PET: a Tool for Post-editing and Assessing Machine Translation , 2012, LREC.

[75]  Fabio Rinaldi,et al.  Question Answering in Terminology-Rich Technical Domains , 2004, New Directions in Question Answering.

[76]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[77]  Christian Hardmeier Improving Machine Translation Quality Prediction with Syntactic Tree Kernels , 2011, EAMT.

[78]  Lucia Specia,et al.  Learning Structural Kernels for Natural Language Processing , 2015, TACL.

[79]  Lucia Specia,et al.  Technology Landscape for Quality Evaluation : Combining the Needs of Research and Industry , 2016 .

[80]  Lucia Specia,et al.  Assessing the Post-Editing Effort for Automatic and Semi-Automatic Translations of DVD Subtitles , 2011, RANLP.

[81]  Philipp Koehn,et al.  Enriching Morphologically Poor Languages for Statistical Machine Translation , 2008, ACL.

[82]  Mark A. Hall,et al.  Correlation-based Feature Selection for Machine Learning , 2003 .

[83]  Arianna Bisazza,et al.  Neural versus Phrase-Based Machine Translation Quality: a Case Study , 2016, EMNLP.

[84]  Hermann Ney,et al.  Towards Automatic Error Analysis of Machine Translation Output , 2011, CL.

[85]  George F. Foster,et al.  Confidence estimation for translation prediction , 2003, CoNLL.

[86]  François Masselot,et al.  A Productivity Test of Statistical Machine Translation Post-Editing in a Typical Localisation Context , 2010, Prague Bull. Math. Linguistics.

[87]  A. Burchardt,et al.  Multidimensional Quality Metrics (MQM): A Framework for Declaring and Describing Translation Quality Metrics , 2014 .

[88]  Chris A. J. Klaassen,et al.  Squared skewness minus kurtosis bounded by 186/125 for unimodal distributions , 2000 .

[89]  Matt J. Kusner,et al.  From Word Embeddings To Document Distances , 2015, ICML.

[90]  Yoshua Bengio,et al.  Word Representations: A Simple and General Method for Semi-Supervised Learning , 2010, ACL.

[91]  Bart Desmet Finding the online cry for help : automatic text classification for suicide prevention , 2014 .

[92]  Ted Pedersen,et al.  An Evaluation Exercise for Word Alignment , 2003, ParallelTexts@NAACL-HLT.

[93]  Robert Malouf,et al.  Wide Coverage Parsing with Stochastic Attribute Value Grammars , 2004 .

[94]  Els Lefever,et al.  TExSIS: Bilingual terminology extraction from parallel corpora using chunk-based alignment. , 2013 .

[95]  Lucia Specia,et al.  Metrics for Evaluation of Word-level Machine Translation Quality Estimation , 2016, ACL.

[96]  Takako Aikawa,et al.  Impact of Controlled Language on Translation Quality and Post-editing in a Statistical Machine Translation Environment , 2007 .

[97]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[98]  M. Tatsumi Correlation between Automatic Evaluation Metric Scores, Post-Editing Speed, and Some Other Factors , 2009, MTSUMMIT.

[99]  Helmut Schmid,et al.  Improvements in Part-of-Speech Tagging with an Application to German , 1999 .

[100]  Antonio Toral,et al.  Fine-Grained Human Evaluation of Neural Versus Phrase-Based Machine Translation , 2017, Prague Bull. Math. Linguistics.

[101]  Stefan Riezler,et al.  QUality Estimation from ScraTCH (QUETCH): Deep Learning for Word-level Translation Quality Estimation , 2015, WMT@EMNLP.

[102]  Sonia Vandepitte,et al.  On the origin of errors: A fine-grained analysis of MT and PE errors and their relationship , 2014, LREC.

[103]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[104]  Sara Stymne,et al.  On the practice of error analysis for machine translation evaluation , 2012, LREC.

[105]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[106]  Robert J. Hartsuiker,et al.  Identifying the Machine Translation Error Types with the Greatest Impact on Post-editing Effort , 2017, Front. Psychol..

[107]  Nelleke Oostdijk,et al.  From D-Coi to SoNaR: a reference corpus for Dutch , 2008, LREC.

[108]  F ChenStanley,et al.  An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.

[109]  Turchi Marco,et al.  Relevance Ranking for Translated Texts , 2012 .

[110]  Joshua B. Tenenbaum,et al.  Building machines that learn and think like people , 2016, Behavioral and Brain Sciences.

[111]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[112]  Wang Ling,et al.  A linguistically motivated taxonomy for Machine Translation error analysis , 2015, Machine Translation.

[113]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[114]  Liesbeth Augustinus,et al.  Example-Based Treebank Querying , 2012, LREC.

[115]  D. Wolpert,et al.  No Free Lunch Theorems for Search , 1995 .

[116]  Michael Gamon,et al.  Sentence-level MT evaluation without reference translations: beyond language modeling , 2005, EAMT.

[117]  Lucia Specia,et al.  Linguistic Features for Quality Estimation , 2012, WMT@NAACL-HLT.

[118]  Hermann Ney,et al.  Error Analysis of Statistical Machine Translation Output , 2006, LREC.

[119]  Philipp Koehn,et al.  Findings of the 2015 Workshop on Statistical Machine Translation , 2015, WMT@EMNLP.

[120]  Lucia Specia,et al.  Word embeddings and discourse information for Quality Estimation , 2016, WMT.

[121]  Philipp Koehn,et al.  Re-evaluating the Role of Bleu in Machine Translation Research , 2006, EACL.

[122]  Lucia Specia,et al.  Exploiting Objective Annotations for Minimising Translation Post-editing Effort , 2011, EAMT.

[123]  Arnt Lykke Jakobsen,et al.  Eye movement behaviour across four different types of reading task , 2008 .

[124]  Michael Carl,et al.  The Process of Post-Editing: A Pilot Study , 2011 .

[125]  Lucila Ohno-Machado,et al.  Logistic regression and artificial neural network classification models: a methodology review , 2002, J. Biomed. Informatics.

[126]  Krzysztof Marasek,et al.  Building Subject-aligned Comparable Corpora and Mining it for Truly Parallel Sentence Pairs , 2015, ArXiv.

[127]  Lucia Specia,et al.  Predicting Machine Translation Adequacy , 2011, MTSUMMIT.

[128]  Ondrej Bojar,et al.  Bilingual Embeddings and Word Alignments for Translation Quality Estimation , 2016, WMT.

[129]  Deborah A. Coughlin,et al.  Correlating automated and human assessments of machine translation quality , 2003, MTSUMMIT.

[130]  Jong-Hyeok Lee,et al.  Predictor-Estimator using Multilevel Task Learning with Stack Propagation for Neural Quality Estimation , 2017, WMT.

[131]  Yoav Goldberg,et al.  The Interplay of Semantics and Morphology in Word Embeddings , 2017, EACL.

[132]  Kamel Smaïli,et al.  LORIA System for the WMT15 Quality Estimation Shared Task , 2015, WMT@EMNLP.

[133]  Marcis Pinnis,et al.  Dynamic Terminology Integration Methods in Statistical Machine Translation , 2015, EAMT.

[134]  François Yvon,et al.  A Corpus of Machine Translation Errors Extracted from Translation Students Exercises , 2014, LREC.

[135]  Nello Cristianini,et al.  Estimating the Sentence-Level Quality of Machine Translation Systems , 2009, EAMT.

[136]  Mineichi Kudo,et al.  Comparison of algorithms that select features for pattern classifiers , 2000, Pattern Recognit..

[137]  Yaser Al-Onaizan,et al.  Goodness: A Method for Measuring Machine Translation Confidence , 2011, ACL.

[138]  Lluís Padró,et al.  FreeLing 3.0: Towards Wider Multilinguality , 2012, LREC.

[139]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[140]  Lakhmi C. Jain,et al.  Recurrent Neural Networks: Design and Applications , 1999 .

[141]  Michael J. Denkowski,et al.  Cognitive demand and cognitive effort in post-editing , 2014, AMTA.

[142]  Matthew G. Snover,et al.  A Study of Translation Edit Rate with Targeted Human Annotation , 2006, AMTA.

[143]  M. W Gardner,et al.  Artificial neural networks (the multilayer perceptron)—a review of applications in the atmospheric sciences , 1998 .

[144]  Alex Kulesza,et al.  Confidence Estimation for Machine Translation , 2004, COLING.

[145]  Ana Guerberof Arenas Productivity and Quality in the Post-editing of Outputs from Translation Memories and Machine Translation , 2008 .

[146]  Hermann Ney,et al.  Word-Level Confidence Estimation for Machine Translation using Phrase-Based Translation Models , 2005, HLT.

[147]  Lucia Specia,et al.  An Investigation on the Effectiveness of Features for Translation Quality Estimation , 2013, MTSUMMIT.

[148]  Ondrej Bojar,et al.  Addicter: What Is Wrong with My Translations? , 2011, Prague Bull. Math. Linguistics.

[149]  Young-Bum Kim,et al.  Task specific continuous word representations for mono and multi-lingual spoken language understanding , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[150]  Daniel Marcu,et al.  Feature-Rich Language-Independent Syntax-Based Alignment for Statistical Machine Translation , 2011, EMNLP.

[151]  Ineke Schuurman,et al.  CGN, an annotated corpus of spoken Dutch , 2003, LINC@EACL.

[152]  Richard M. Schwartz,et al.  Combining Outputs from Multiple Machine Translation Systems , 2007, NAACL.

[153]  Richard Simon,et al.  Bias in error estimation when using cross-validation for model selection , 2006, BMC Bioinformatics.

[154]  Martín Abadi,et al.  TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[155]  Ondrej Bojar,et al.  Terra: a Collection of Translation Error-Annotated Corpora , 2012, LREC.

[156]  Lucia Specia,et al.  SHEF-NN: Translation Quality Estimation with Neural Networks , 2015, WMT@EMNLP.

[157]  Maja Popović,et al.  Relations between different types of post-editing operations, cognitive effort and temporal effort , 2014, EAMT.

[158]  Sonia Vandepitte,et al.  Quality as the sum of its parts: a two-step approach for the identification of translation problems and translation quality assessment for HT and MT+PE , 2013, MTSUMMIT.

[159]  Nizar Habash,et al.  CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies , 2017, CoNLL.

[160]  Irina P. Temnikova,et al.  Cognitive Evaluation Approach for a Controlled Language Post-Editing Experiment , 2010, LREC.

[161]  Carolina Scarton,et al.  Document-level machine translation quality estimation , 2016 .

[162]  Nitin Madnani,et al.  Fluency, Adequacy, or HTER? Exploring Different Human Judgments with a Tunable MT Metric , 2009, WMT@EACL.

[163]  Andrew Y. Ng,et al.  Parsing Natural Scenes and Natural Language with Recursive Neural Networks , 2011, ICML.

[164]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[165]  J. R. Landis,et al.  The measurement of observer agreement for categorical data. , 1977, Biometrics.

[166]  Hermann Ney,et al.  Application of word-level confidence measures in interactive statistical machine translation , 2005, EAMT.

[167]  Gertjan van Noord Robust Parsing of Word Graphs , 2001 .

[168]  Jianfeng Gao,et al.  Domain Adaptation via Pseudo In-Domain Data Selection , 2011, EMNLP.

[169]  Mauro Cettolo,et al.  IRSTLM: an open source toolkit for handling large scale language models , 2008, INTERSPEECH.

[170]  Mats Rooth,et al.  Structural Ambiguity and Lexical Relations , 1991, ACL.

[171]  Ramón Fernández Astudillo,et al.  Unbabel's Participation in the WMT16 Word-Level Translation Quality Estimation Shared Task , 2016, WMT.

[172]  Mary A. Flanagan,et al.  Error Classification for MT Evaluation , 1994, AMTA.

[173]  Rohit Kumar,et al.  Lightly supervised word-sense translation-error detection and resolution in an interactive conversational spoken language translation system , 2015, Machine Translation.

[174]  Joakim Nivre,et al.  Feature Description for the Transition-Based Parser for Joint Part-of-Speech Tagging and Labeled Non-Projective Dependency Parsing , 2012 .

[175]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[176]  Walter Daelemans,et al.  Memory-Based Language Processing: Application to shallow parsing , 2005 .

[177]  Daniel Gouadec,et al.  Parametres de l'evaluation des traductions (Criteria for translation evaluation). , 1981 .

[178]  Wei-Yun Ma,et al.  System Combination for Machine Translation Based on Text-to-Text Generation , 2011, MTSUMMIT.

[179]  Arda Tezcan,et al.  Post-edited quality, post-editing behaviour and human evaluation: a case study , 2014 .

[180]  Frank Van Eynde Part of Speech Tagging en Lemmatisering , 2003 .

[181]  Keinosuke Fukunaga,et al.  Introduction to statistical pattern recognition (2nd ed.) , 1990 .

[182]  Dimitri Kartsaklis,et al.  Compositional Operators in Distributional Semantics , 2014, Springer Science Reviews.

[183]  Dan Klein,et al.  Accurate Unlexicalized Parsing , 2003, ACL.