Make It Simple with Paraphrases

This book presents a novel scientific approach to improve machine translation by paraphrasing support verb constructions with semantically equivalent verbs (e.g. make a presentation of/present). The author demonstrates that this strategy produces a positive impact in machine translation. The study is reproducible and extendable to distinct linguistic phenomena and successfully applied to different- purpose natural language processing applications. The author exemplifies how paraphrases can be efficiently employed by authoring aids to help simplify and clarify texts, presenting obvious benefits to linguistic quality assurance in text processing. While addressing and providing a solution for a specific linguistic problem, this book presents a comprehensive theoretical background and exposure of conceptual problems that will interest natural language processing professionals, linguists, translators, and students. Written in a simple language, this book will be easily understood by non-specialists in the field who have an interest in natural language.

[1]  Jorge Baptista,et al.  Frozen Sentences of Portuguese: Formal Descriptions for NLP , 2004 .

[2]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[3]  W. Burghardt,et al.  Text processing , 1979 .

[4]  Martha Palmer,et al.  Nominalization and Alternations in Biomedical Language , 2008, PloS one.

[5]  Sanda Harabagiu,et al.  High-performance, open-domain question answering from large text collections , 2001 .

[6]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[7]  Paula Cristina Carvalho,et al.  Análise e representacão de construcões adjectivais para processamento automático de texto. Adjectivos intransitivos humanos , 2007 .

[8]  Anabela Barreiro,et al.  ParaMT: A Paraphraser for Machine Translation , 2008, PROPOR.

[9]  Catherine Fuchs,et al.  Paraphrase et énonciation , 1994 .

[10]  Diana Santos Lexical gaps and idioms in machine translation , 1990, COLING.

[11]  Philipp Koehn,et al.  Re-evaluating the Role of Bleu in Machine Translation Research , 2006, EACL.

[12]  Christian Wolff,et al.  Learning Relations Using Collocations , 2001, Workshop on Ontology Learning.

[13]  Fei Xia,et al.  Improving a Statistical MT System with Automatically Learned Rewrite Patterns , 2004, COLING.

[14]  M. Hoey Lexical Priming: A New Theory of Words and Language , 2005 .

[15]  J. Sinclair The Search for Units of Meaning , 1996 .

[16]  Satoshi Sekine,et al.  Automatic paraphrase acquisition from news articles , 2002 .

[17]  Anabela Barreiro,et al.  Machine translation challenges for Portuguese , 2005 .

[18]  Mike Dillinger,et al.  Collocation Extraction for Machine Translation , 2003 .

[19]  Philipp Koehn,et al.  Clause Restructuring for Statistical Machine Translation , 2005, ACL.

[20]  Graeme Hirst,et al.  Semantic representations of near-synonyms for automatic lexical choice , 1999 .

[21]  H. Gruber Political Language and Textual Vagueness , 1993 .

[22]  Sérgio Matos,et al.  Corpógrafo V4 - Tools for Researchers and Teachers using Comparable Corpora , 2008 .

[23]  Smaranda Muresan,et al.  Generalizing Word Lattice Translation , 2008, ACL.

[24]  Luís Sarmento,et al.  Gathering empirical data to evaluate MT from English to Portuguese , 2004 .

[25]  Mark Aronoff,et al.  Contemporary linguistics: An introduction , 1989 .

[26]  Nizar Habash,et al.  Using Shallow Syntax Information to Improve Word Alignment and Reordering for SMT , 2008, WMT@ACL.

[27]  Clifford E. Landers Literary Translation: A Practical Guide , 2001 .

[28]  Max Silberztein NooJ : a cooperative object oriented architecture for NLP , 2003 .

[29]  D. Davidson What Metaphors Mean , 1978, Critical Inquiry.

[30]  Satoshi Sekine,et al.  A survey of named entity recognition and classification , 2007 .

[31]  Leo Wanner Recent trends in meaning-text theory , 1997 .

[32]  Qiao Zhang,et al.  Fuzziness - vagueness - generality - ambiguity , 1998 .

[33]  Noam Chomsky,et al.  The Logical Structure of Linguistic Theory , 1975 .

[34]  Chao Wang,et al.  Chinese Syntactic Reordering for Statistical Machine Translation , 2007, EMNLP.

[35]  G. Lakoff Metaphor and War: The Metaphor System Used to Justify War in the Gulf , 1992 .

[36]  Chris Callison-Burch,et al.  Paraphrasing and translation , 2007 .

[37]  R. Studer,et al.  Semantic Web Technologies: Trends and Research in Ontology-based Systems , 2006 .

[38]  Martin Haspelmath,et al.  Noun phrase coordination , 2001 .

[39]  Wolfgang Teubert,et al.  Corpus Linguistics and Lexicography , 2001 .

[40]  Susan Conrad,et al.  Corpus Linguistics: Investigating Language Structure and Use , 1998 .

[41]  Dan Roth,et al.  An Inference Model for Semantic Entailment in Natural Language , 2005, IJCAI.

[42]  Wolfgang Mieder,et al.  Proverbs: A Handbook , 2004 .

[43]  Regina Barzilay,et al.  Learning to Paraphrase: An Unsupervised Approach Using Multiple-Sequence Alignment , 2003, NAACL.

[44]  Bernard Scott,et al.  The Logos Model: An Historical Perspective , 2003, Machine Translation.

[45]  Daniel Gildea,et al.  The Proposition Bank: An Annotated Corpus of Semantic Roles , 2005, CL.

[46]  Dan Roth,et al.  Semantic and Logical Inference Model for Textual Entailment , 2007, ACL-PASCAL@ACL.

[47]  Chris Quirk,et al.  The impact of parse quality on syntactically-informed statistical machine translation , 2006, EMNLP.

[48]  Yves Lepage,et al.  BLEU in Characters: Towards Automatic MT Evaluation in Languages without Word Delimiters , 2004, IJCNLP.

[49]  G. Lakoff,et al.  Metaphors We Live By , 1980 .

[50]  G. Lakoff,et al.  More than Cool Reason: A Field Guide to Poetic Metaphor , 1991 .

[51]  Marius Pasca,et al.  Mining Paraphrases from Self-anchored Web Sentence Fragments , 2005, PKDD.

[52]  M. Halliday Spoken and Written Language , 1989 .

[53]  J. R. Firth,et al.  A Synopsis of Linguistic Theory, 1930-1955 , 1957 .

[54]  Gabriel G. Bès,et al.  Aspects de l'ambiguïté et de la paraphrase dans les langues naturelles , 1985 .

[55]  Chris Callison-Burch,et al.  Syntactic Constraints on Paraphrases Extracted from Parallel Corpora , 2008, EMNLP.

[56]  Timothy Baldwin,et al.  Multiword Expressions: A Pain in the Neck for NLP , 2002, CICLing.

[57]  Marius Pasca,et al.  Aligning Needles in a Haystack: Paraphrase Acquisition Across the Web , 2005, IJCNLP.

[58]  Katharina Reiss,et al.  Fundamentos para una teoría funcional de la traducción , 1996 .

[59]  Pascale Fung,et al.  Compiling Bilingual Lexicon Entries From a Non-Parallel English-Chinese Corpus , 1995, VLC@ACL.

[60]  Satoshi Sekine,et al.  Paraphrase Acquisition for Information Extraction , 2003, IWP@ACL.

[61]  Philip Resnik,et al.  Inducing Frame Semantic Verb Classes from WordNet and LDOCE , 2004, ACL.

[62]  Miriam Butt The Light Verb Jungle , 2003 .

[63]  John Lyons,et al.  Linguistic Semantics: An Introduction , 1995 .

[64]  Taro Watanabe,et al.  Example-based Machine Translation Based on Syntactic Transfer with Statistical Models , 2004, COLING.

[65]  Ralph Grishman,et al.  Annotating Noun Argument Structure for NomBank , 2004, LREC.

[66]  George A. Miller,et al.  Introduction to WordNet: An On-line Lexical Database , 1990 .

[67]  Lawrence Venuti The Translator's Invisibility: A History of Translation , 2017 .

[68]  Michael Halliday,et al.  An Introduction to Functional Grammar , 1985 .

[69]  Z. Harris,et al.  Methods in structural linguistics. , 1952 .

[70]  Stefan Evert,et al.  Experiments on Candidate Data for Collocation Extraction , 2003, EACL.

[71]  A Elithorn,et al.  ARTIFICIAL AND HUMAN INTELLIGENCE , 1984 .

[72]  Igor Mel’čuk,et al.  Dependency Syntax: Theory and Practice , 1987 .

[73]  Elisabete Ranchhod On the Support Verbs Ser and Estar in Portuguese , 1983 .

[74]  Noam Chomsky Derivation by phase , 1999 .

[75]  Andreas Eisele,et al.  Using Moses to Integrate Multiple Rule-Based Machine Translation Engines into a Hybrid System , 2008, WMT@ACL.

[76]  Anabela Barreiro Port4NooJ: Portuguese Linguistic Module and Bilingual Resources for Machine Translation , 2008 .

[77]  P. Newmark A textbook of translation , 1988 .

[78]  J. R. Firth,et al.  Studies in Linguistic Analysis. , 1974 .

[79]  Franca Giannini,et al.  Semantic Granularity for the Semantic Web , 2006, OTM Workshops.

[80]  Makoto Nagao,et al.  A framework of a mechanical translation between Japanese and English by analogy principle , 1984 .

[81]  Theodore Horace Savory,et al.  The art of translation , 1957 .

[82]  Ken Hale,et al.  On Argument Structure and the Lexical Expression of Syntactic Relations , 1993 .

[83]  Rahul Bhagat,et al.  Large Scale Acquisition of Paraphrases for Learning Surface Patterns , 2008, ACL.

[84]  Frank van Harmelen,et al.  Introduction to Semantic Web Ontology Languages , 2005, Reasoning Web.

[85]  Daniel Marcu,et al.  Syntax-based Alignment of Multiple Translations: Extracting Paraphrases and Generating New Sentences , 2003, NAACL.

[86]  William C. Mann,et al.  Natural Language Generation in Artificial Intelligence and Computational Linguistics , 1990 .

[87]  Kazuhide Yamamoto,et al.  Interaction between Paraphraser and Transfer for Spoken Language Translation (自然言語処理特集号「言い換え」) , 2004 .

[88]  Jimmy J. Lin,et al.  Extracting Structural Paraphrases from Aligned Monolingual Corpora , 2003, IWP@ACL.

[89]  M. Gross Les bases empiriques de la notion de predicat semantique (The Empirical Foundations of the Notion of Semantic Predicate). , 1981 .

[90]  R. Larson On the double object construction , 1988 .

[91]  John Biguenet,et al.  The craft of translation , 1990 .

[92]  Paula Carvalho,et al.  Expressões Multipalavra - Questões lexicais e sintácticas , 2006 .

[93]  Maurice Gross Les formes Être Prép X du français , 1996 .

[94]  Morris Salkoff A French-English Grammar: A contrastive grammar on translational principles , 1999 .

[95]  Christiane Nord A functional typology of translation , 1997 .

[96]  Ang P.S.,et al.  Master\'s Dissertation , 2009 .

[97]  Wu Hua,et al.  Improving statistical word alignment with a rule-based machine translation system , 2004, COLING 2004.

[98]  David Evans,et al.  Tracking and summarizing news on a daily basis with Columbia's Newsblaster , 2002 .

[99]  Eugene A. Nida,et al.  Science of Translation , 1969 .

[100]  Yuji Matsumoto,et al.  Paraphrasing of Japanese Light-verb Constructions Based on Lexical Conceptual Structure , 2004, ACL 2004.

[101]  Graeme Hirst,et al.  Near-Synonymy and Lexical Choice , 2002, CL.

[102]  Jasmina Milićević,et al.  La paraphrase : modélisation de la paraphrase langagière , 2007 .

[103]  Jasmina Milicevic Semantic Equivalence Rules in Meaning-Text Paraphrasing , 2007 .

[104]  Jörg Tiedemann,et al.  Finding Synonyms Using Automatic Word Alignment and Measures of Distributional Similarity , 2006, ACL.

[105]  Lawrence Venuti The Translation Studies Reader , 2000 .

[106]  Manabu Okumura,et al.  Corpus and Evaluation Measures for Multiple Document Summarization with Multiple Sources , 2004, COLING.

[107]  Jimmy J. Lin,et al.  A Paraphrase-Based Approach to Machine Translation Evaluation , 2005 .

[108]  Philipp Koehn,et al.  Improved Statistical Machine Translation Using Paraphrases , 2006, NAACL.

[109]  Thierry Poibeau Automatic extraction of paraphrastic phrases from medium-size corpora , 2004, COLING.

[110]  Noam Chomsky,et al.  वाक्यविन्यास का सैद्धान्तिक पक्ष = Aspects of the theory of syntax , 1965 .

[111]  Rogelio Nazar,et al.  Two-step flow in bilingual lexicon extraction from unrelated corpora , 2008, EAMT.

[112]  Diana Santos,et al.  Perspectivas sobre a Linguateca / Actas do encontro Linguateca : 10 anos , 2008 .

[113]  Satoshi Sekine,et al.  Automatic Paraphrase Discovery based on Context and Keywords between NE Pairs , 2005, IJCNLP.

[114]  Beatrice Santorini,et al.  Part-of-Speech Tagging Guidelines for the Penn Treebank Project (3rd Revision) , 1990 .

[115]  Jennifer Chu-Carroll,et al.  Answering the question you wish they had asked: The impact of paraphrasing for Question Answering , 2006, NAACL.

[116]  W. N. Locke,et al.  Machine Translation of Languages , 1956 .

[117]  Zellig S. Harris Transformations in Linguistic Structure , 1970 .

[118]  G. Lapalme,et al.  Generating paraphrases from meaning‐text semantic networks , 1985 .

[119]  Stefan Schulz,et al.  Cognate Mapping - A Heuristic Strategy for the Semi-Supervised Acquisition of a Spanish Lexicon from a Portuguese Seed Lexicon , 2004, COLING.

[120]  Noam Chomsky,et al.  Rules and representations , 1980, Behavioral and Brain Sciences.

[121]  Morris Salkoff,et al.  Automatic translation of support verb constructions , 1990, COLING.

[122]  Antoine Culioli,et al.  Opérations et représentations , 1990 .

[123]  Leo Wanner Lexical functions in lexicography and natural language processing , 1996 .

[124]  Frank Smadja,et al.  Xtract: An overview , 1992, Comput. Humanit..

[125]  Alain Polguère,et al.  Lexical Selection and Paraphrase in a Meaning-Text Generation Model , 1991 .

[126]  Adam Meyers,et al.  NP-External Arguments: A Study of Argument Sharing in English , 2004 .

[127]  Chris Quirk,et al.  Unsupervised Construction of Large Paraphrase Corpora: Exploiting Massively Parallel News Sources , 2004, COLING.

[128]  Liang Zhou,et al.  Re-evaluating Machine Translation Results with Paraphrase Support , 2006, EMNLP.

[129]  Alon Lavie,et al.  METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments , 2005, IEEvaluation@ACL.

[130]  Luís Sarmento BACO - A large database of text and co-occurrences , 2006, LREC.

[131]  Z. Harris Co-Occurrence and Transformation in Linguistic Structure , 1957 .

[132]  Diana Santos,et al.  Annotating COMPARA, a Grammar-aware Parallel Corpus , 2006, LREC.

[133]  Grigoris Antoniou,et al.  DR-Prolog: A System for Defeasible Reasoning with Rules and Ontologies on the Semantic Web , 2007, IEEE Transactions on Knowledge and Data Engineering.

[134]  Willis Barnstone The Poetics of Translation: History, Theory, Practice , 1993 .

[135]  Christopher R. Johnson,et al.  Background to Framenet , 2003 .

[136]  Harold L. Somers,et al.  An introduction to machine translation , 1992 .

[137]  Alain Polguère,et al.  Some Procedural Problems in the Implementation of Lexical Functions for Text Generation , 1996 .

[138]  J. C. Catford,et al.  A linguistic theory of translation : an essay in applied linguistics , 1965 .

[139]  Philipp Koehn,et al.  Learning a Translation Lexicon from Monolingual Corpora , 2002, ACL 2002.

[140]  Marilyn Cross,et al.  Choice in Lexis: Computer generation of lexis as most delicate grammar , 1992 .

[141]  Patrick Pantel,et al.  DIRT @SBT@discovery of inference rules from text , 2001, KDD '01.

[142]  Nitin Madnani,et al.  Using Paraphrases for Parameter Tuning in Statistical Machine Translation , 2007, WMT@ACL.

[143]  Mark Dras,et al.  Automatic Identification of Support Verbs: A Step Towards a Definition of Semantic Weight , 1995, ArXiv.

[144]  Theo Hermans,et al.  The Manipulation of Literature: Studies in Literary Translation@@@The Poet's Other Voice: Conversations on Literary Translation , 1985 .

[145]  Regina Barzilay,et al.  Extracting Paraphrases from a Parallel Corpus , 2001, ACL.

[146]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[147]  Diana Santos Broad-Coverage Machine Translation , 1993, Natural Language Processing.

[148]  Samuel D. Guttenplan,et al.  Objects of metaphor , 2005 .

[149]  James A. Hendler,et al.  The Semantic Web: A new form of Web content that is meaningful to computers will unleash a revolution of new possibilities , 2001 .

[150]  James R. Glass,et al.  Telephone-based conversational speech recognition in the JUPITER domain , 1998, ICSLP.

[151]  Stephanie W. Haas,et al.  Terminology Development and Organization in Multi-Community Environments: The Case of Statistical Information , 2011 .

[152]  Robert Martin,et al.  Pour une logique du sens , 1983 .

[153]  Chris Callison-Burch,et al.  Paraphrasing with Bilingual Parallel Corpora , 2005, ACL.

[154]  Petr Sgall,et al.  The Meaning Of The Sentence In Its Semantic And Pragmatic Aspects , 1986 .

[155]  Zellig S. Harris,et al.  Mathematical structures of language , 1968, Interscience tracts in pure and applied mathematics.

[156]  Chutima Boonthum iSTART: Paraphrase Recognition , 2004, ACL 2004.

[157]  Hiroaki Sato,et al.  The FrameNet Database and Software Tools , 2002, LREC.

[158]  Luís Sarmento,et al.  The Corpógrafo - a Web-based Environment for Corpora Research , 2004, LREC.

[159]  Dean Allemang,et al.  Semantic Web for the Working Ontologist - Effective Modeling in RDFS and OWL, Second Edition , 2011 .

[160]  Anabela Barreiro Novas Ferramentas e Recursos Linguísticos para a Tradução Automática: Por ocasião d'O Fim do Início de uma Nova Era no Processamento da Língua Portuguesa , 2008 .

[161]  Stephanie W. Haas A Terminology Crosswalk for LABSTAT: Mapping General Language Words and Phrases To BLS Terms , 2000 .

[162]  Dragos Stefan Munteanu,et al.  ParaEval: Using Paraphrases to Evaluate Summaries Automatically , 2006, NAACL.

[163]  Kazuhide Yamamoto Machine Translation by Interaction between Paraphraser and Transfer , 2002, COLING.

[164]  Ana Frankenberg-Garcia,et al.  Introducing COMPARA, the Portuguese-English parallel translation corpus , 2003 .

[165]  Maurice Gross,et al.  Méthodes en syntaxe : régime des constructions complétives , 1978 .

[166]  Ralph Grishman,et al.  An Electronic Lexicon of Nominalizations: NOMLEX , 2000 .

[167]  Luís Sarmento,et al.  Corpógrafo V3 - From Terminological Aid to Semi-automatic Knowledge Engineering , 2006, LREC.

[168]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[169]  W. Mieder International proverb scholarship : an annotated bibliography , 1984 .

[170]  Adam Kilgarriff,et al.  WASP-Bench: an MT lexicographers’ workstation supporting state-of-the-art lexical disambiguation , 2001, MTSUMMIT.