Automatic Generation of Syntactically Well-formed and Semantically Appropriate Paraphrases

Paraphrases of an expression are alternative linguistic expressions conveying the same information as the original. Technology for handling paraphrases has been attracting increasing attention due to its potential in a wide range of natural language processing applications; e.g., machine translation, information retrieval, question answering, summarization, authoring and revision support, and reading assistance. In this thesis, we focus on lexical and structural paraphrases in Japanese, such as lexical and phrasal replacement, verb alternation, and topicalization, which can be generated relying on linguistic knowledge only. First, we address how to generate well-formed and appropriate paraphrases. One of the major problems is that it is practically impossible to take into account all sorts of semantic and discourse-related factors which affect the well-formedness and appropriateness of paraphrases. The knowledge, such as transformation rules, used for paraphrase generation tends to be underspecified, and thus would produce erroneous output. The revision process is introduced to detect and correct ill-formed and inappropriate candidates generated in the transfer stage. Within this framework, we first investigate what types of errors tend to occur in lexical and structural paraphrasing, and confirm the feasibility of our transfer-and-revision framework by revealing that most errors occur irrespective of classes of transformation rules. On the basis of another observation; that errors associated with case assignments form one of the major error types, we develop a model for detecting this type of error. The model utilizes a large collection of positive examples and a small collection of negative ones by combining supervised and unsupervised machine learning methods. Experimental results indicate that our model significantly outperforms conventional models. The second issue is to develop a mechanism that is capable of covering a wide variety of paraphrases. One way of gaining the coverage of paraphrase generation is to exploit the systemicity underlying several classes of paraphrases, such as verb alternation and compound noun decomposition. To capture the semantic properties required for generating these classes of paraphrases, we utilize the Lexical Conceptual Structure (LCS). The framework represents verbs as semantic structures with focus of statement and relationships between semantic arguments and syntactic cases. We implement a paraphrase generation model which consists of a case assignment rule and a handful of LCS transformation rules, with particular focus on ∗Doctoral Dissertation, Department of Information Processing, Graduate School of Information Science, Nara Institute of Science and Technology, NAIST-IS-DD0261023, March 2005.

[1]  宮本 正夫,et al.  The light verb construction in Japanese : the role of the verbal noun , 1999 .

[2]  Bonnie J. Dorr,et al.  Large-Scale Dictionary Construction for Foreign Language Tutoring and Interlingual Machine Translation , 1998, Machine Translation.

[3]  Bonnie J. Dorr,et al.  Machine Translation: A View from the Lexicon , 1994, CL.

[4]  Daniel Marcu,et al.  Natural Language Based Reformulation Resource and Wide Exploitation for Question Answering , 2002, TREC.

[5]  Stephen Kwek,et al.  Applying Support Vector Machines to Imbalanced Datasets , 2004, ECML.

[6]  Lillian Lee,et al.  Measures of Distributional Similarity , 1999, ACL.

[7]  Mats Rooth,et al.  Inducing a Semantically Annotated Lexicon via EM-Based Clustering , 1999, ACL.

[8]  Fujita Atsushi Inui Decomposing Linguistic Knowledge for Lexical Paraphrasing , 2004 .

[9]  Daisuke Kawahara,et al.  Fertilization of Case Frame Dictionary for Robust Japanese Case Analysis , 2002, COLING.

[10]  Yuji Matsumoto,et al.  Detection of Incorrect Case Assignments in Automatically Generated Paraphrases , 2004 .

[11]  George R. Doddington,et al.  Automatic Evaluation of Machine Translation Quality Using N-gram Co-Occurrence Statistics , 2002 .

[12]  Eduard H. Hovy,et al.  Learning surface text patterns for a Question Answering System , 2002, ACL.

[13]  Yuji Matsumoto,et al.  Paraphrasing of Japanese Light-verb Constructions Based on Lexical Conceptual Structure , 2004, ACL 2004.

[14]  Baldwin,et al.  Dictionary-driven analysis of Japanese verbal alternations , 2007 .

[15]  Advaith Siddharthan,et al.  Preserving Discourse Structure when Simplifying Text , 2003, ENLG@EACL.

[16]  Francis Bond,et al.  A method of adding new entries to a valency dictionary by exploiting existing lexical resources. , 2002, TMI.

[17]  Daniel Marcu,et al.  Fast Decoding and Optimal Decoding for Machine Translation , 2001, ACL.

[18]  Xiaorong Huang,et al.  Paraphrasing and Aggregating Argumentative Texts Using Text Structure , 1996, INLG.

[19]  Alain Polguère,et al.  A Formal Lexicon in the Meaning-Text Theory (or How to Do Lexica with Words) , 1987, Comput. Linguistics.

[20]  Sadao Kurohashi,et al.  Paraphrasing Predicates from Written Language to Spoken Language Using the Web , 2004, HLT-NAACL.

[21]  Regina Barzilay,et al.  Extracting Paraphrases from a Parallel Corpus , 2001, ACL.

[22]  Regina Barzilay,et al.  Towards Multidocument Summarization by Reformulation: Progress and Prospects , 1999, AAAI/IAAI.

[23]  Inderjeet Mani,et al.  Improving Summaries by Revising Them , 1999, ACL.

[24]  Daniel Marcu,et al.  Syntax-based Alignment of Multiple Translations: Extracting Paraphrases and Generating New Sentences , 2003, NAACL.

[25]  Yuji Matsumoto,et al.  Detection of Incorrect Case Assignments in Paraphrase Generation , 2004, IJCNLP.

[26]  Kentaro Inui,et al.  A Paraphrase-Based Exploration of Cohesiveness Criteria , 2001, EWNLG@ACL.

[27]  Kazuhide Yamamoto Acquisition of Lexical Paraphrases from Texts , 2002, COLING 2002.

[28]  Fabio Rinaldi,et al.  Exploiting Paraphrases in a Question Answering System , 2003, IWP@ACL.

[29]  Makoto Nagao,et al.  Building a Japanese parsed corpus while improving the parsing system , 1997 .

[30]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[31]  Kentaro Inui,et al.  Automatic Detection of Verb Valency Errors in Paraphrasing , 2004 .

[32]  Shun Ishizaki,et al.  A Disambiguation Method for Japanese Compound Verbs , 2003, ACL 2003.

[33]  Mitsuo Shimohata,et al.  Acquiring Paraphrases from Corpora and Its Application to Machine Translation , 2004 .

[34]  Forbes Ave. Pittsburgh Automatic Rewriting for Controlled Language Translation , 2001 .

[35]  Kazuhide Yamamoto,et al.  Applicability Analysis of Corpus-derived Paraphrases toward Example-based Paraphrasing , 2003, PACLIC.

[36]  Jianhua Lin,et al.  Divergence measures based on the Shannon entropy , 1991, IEEE Trans. Inf. Theory.

[37]  Kentaro Inui,et al.  Corpus-Based Acquisition of Sentence Readability Ranking Models for Deaf People , 2001, NLPRS.

[38]  Sadao Kurohashi,et al.  Semantic Analysis of Japanese Noun Phrases - A New Approach to Dictionary-Based Understanding , 1999, ACL.

[39]  Graeme Hirst,et al.  Semantic representations of near-synonyms for automatic lexical choice , 1999 .

[40]  Heidi S. Kramer,et al.  Other publications , 2007, The English Historical Review.

[41]  Thomas Hofmann,et al.  Probabilistic latent semantic indexing , 1999, SIGIR '99.

[42]  George A. Miller,et al.  Introduction to WordNet: An On-line Lexical Database , 1990 .

[43]  EstimationPeter,et al.  The Mathematics of Machine Translation : Parameter , 2004 .

[44]  Bonnie J. Dorr Semantic annotation and lexico-syntactic paraphrase , 2004 .

[45]  Patrick Pantel,et al.  Discovery of inference rules for question-answering , 2001, Natural Language Engineering.

[46]  Frank Keller,et al.  Evaluating Smoothing Algorithms against Plausibility Judgements , 2001, ACL.

[47]  Satoshi Sekine,et al.  Paraphrase Acquisition for Information Extraction , 2003, IWP@ACL.

[48]  Alain Polguère,et al.  Lexical Selection and Paraphrase in a Meaning-Text Generation Model , 1991 .

[49]  Sergei Nirenburg,et al.  De-Constraining Text Generation , 1998, INLG.

[50]  Z. Harris Co-Occurrence and Transformation in Linguistic Structure , 1957 .

[51]  Daniel Gildea,et al.  The Necessity of Parsing for Predicate Argument Recognition , 2002, ACL.

[52]  Yuji Matsumoto,et al.  Effects of Structural Matching and Paraphrasing in Question Answering , 2003 .

[53]  Chris Quirk,et al.  Unsupervised Construction of Large Paraphrase Corpora: Exploiting Massively Parallel News Sources , 2004, COLING.

[54]  Daniel Gildea,et al.  Automatic Labeling of Semantic Roles , 2000, ACL.

[55]  Chris Quirk,et al.  Monolingual Machine Translation for Paraphrase Generation , 2004, EMNLP.

[56]  Varda Shaked,et al.  Strategies for Effective Paraphrasing , 1988, COLING.

[57]  Andy Way,et al.  Recent Advances in Example-Based Machine Translation , 2004 .

[58]  Yoshihiko Hayashi,et al.  Dividing Japanese complex sentences based on conjunctive expressions analysis (abstract) , 1992 .

[59]  Evelyne Tzoukermann,et al.  Expansion of Multi-Word Terms for Indexing and Retrieval Using Morphology and Syntax , 1997, ACL.

[60]  Eiichiro Sumita,et al.  Identifying Synonymous Expressions from a Bilingual Corpus for Example-Based Machine Translation , 2002, COLING 2002.

[61]  Hiroaki Saito,et al.  Preferential Presentation of Japanese Near-synonyms using Definition Statements , 2003, IWP@ACL.

[62]  Kevin Knight,et al.  Automated Postediting of Documents , 1994, AAAI.

[63]  Taro Watanabe,et al.  Paraphrasing as Machine Translation (自然言語処理特集号「言い換え」) , 2004 .

[64]  Vasileios Hatzivassiloglou,et al.  Two-Level, Many-Paths Generation , 1995, ACL.

[65]  K. Kondo Summarization with Dictionary-based Paraphrasing , 1997 .

[66]  Eduard H. Hovy,et al.  Automatic Evaluation of Summaries Using N-gram Co-occurrence Statistics , 2003, NAACL.

[67]  Regina Barzilay,et al.  Information Fusion in the Context of Multi-Document Summarization , 1999, ACL.

[68]  Siobhan Devlin,et al.  Simplifying Text for Language-Impaired Readers , 1999, EACL.

[69]  Kathleen F. McCoy,et al.  Generation of Single-sentence Paraphrases from Predicate/Argument Structure using Lexico-grammatical Resources , 2003, IWP@ACL.

[70]  France T́elécom Learning Paraphrases to Improve a Question-Answering System , 2003 .

[71]  Hiroshi Kanayama Paraphrasing Rules for Automatic Evaluation of Translation into Japanese , 2003, IWP@ACL.

[72]  Kazuhide Yamamoto Machine Translation by Interaction between Paraphraser and Transfer , 2002, COLING.

[73]  Eiichiro Sumita,et al.  Automatic paraphrasing based on parallel corpus for normalization , 2002, LREC.

[74]  S. Kurohashi,et al.  Recognition and Parapharasing of Periphrastic and Overlapping Verb Phrases , 2004 .

[75]  Nizar Habash,et al.  Handling translation divergences: combining statistical and symbolic techniques in generation-heavy machine translation , 2002, AMTA.

[76]  Naftali Tishby,et al.  Distributional Clustering of English Words , 1993, ACL.

[77]  Kentaro Inui,et al.  An environment for constructing nominal-paraphrase corpora , 2000 .

[78]  Kentaro Torisawa An Unsupervised Learning Method for Associative Relationships between Verb Phrases , 2002, COLING.

[79]  Francis Bond,et al.  Japanese-English paraphrase corpus , 2001 .

[80]  Manabu Okumura,et al.  Producing More Readable Extracts by Revising Them , 1999, COLING.

[81]  Kathleen R. McKeown,et al.  Information fusion for multidocument summarization: paraphrasing and generation , 2003 .

[82]  Kyo Kageura,et al.  An LCS-Based Approach for Analyzing Japanese Compound Nouns with Deverbal Heads , 2002, COLING 2002.

[83]  Owen Rambow,et al.  A Framework for MT and Multilingual NLG Systems Based on Uniform Lexico-Structural Processing , 2000, ANLP.

[84]  Takenobu Tokunaga,et al.  Automatic disabbreviation by using context information , 2001 .

[85]  Yoshihiko Hayashi,et al.  A Three-level Revision Model for Improving Japanese Bad-styled Expressions , 1992, COLING.

[86]  Dan Tufis,et al.  Empirical Methods for Exploiting Parallel Texts , 2002, Lit. Linguistic Comput..

[87]  Caroline Brun,et al.  Normalization and Paraphrasing Using Symbolic Methods , 2003, IWP@ACL.

[88]  Kentaro Torisawa A Nearly Unsupervised Learning Method for Automatic Paraphrasing of Japanese Noun Phrases , 2001 .

[89]  Mark Dras,et al.  Tree adjoining grammar and the reluctant paraphrasing of text , 1999 .

[90]  Wolfgang Wahlster,et al.  Verbmobil: Foundations of Speech-to-Speech Translation , 2000, Artificial Intelligence.

[91]  Satoshi Sato,et al.  Verb Paraphrase based on Case Frame Alignment , 2002, ACL.

[92]  Graeme Hirst,et al.  Generating More-Positive and More-Negative Text , 2006, Computing Attitude and Affect in Text.

[93]  Srinivas Bangalore,et al.  Corpus-Based Lexical Choice in Natural Language Generation , 2000, ACL.

[94]  Peter G. Anick,et al.  The paraphrase search assistant: terminological feedback for iterative information seeking , 1999, SIGIR '99.

[95]  Lillian Lee,et al.  On the effectiveness of the skew divergence for statistical language analysis , 2001, AISTATS.

[96]  Satoshi Sato,et al.  Paraphrasing a Functional Word "nara" for Machine Translation , 2004 .

[97]  M. F.,et al.  Bibliography , 1985, Experimental Gerontology.

[98]  Maria Lapata,et al.  A Corpus-based Account of Regular Polysemy: The Case of Context-sensitive Adjectives , 2001, NAACL.

[99]  Beth Levin,et al.  English Verb Classes and Alternations: A Preliminary Investigation , 1993 .

[100]  Kathleen McKeown,et al.  Empirically Designing and Evaluating a New Revision-Based Model for Summary Generation , 1996, Artif. Intell..

[101]  Daisuke Kawahara,et al.  Japanese Case Frame Construction by Coupling the Verb and its Closest Case Component , 2001, HLT.

[102]  Manabu Okumura,et al.  Paraphrasing by Case Alternation , 2000 .

[103]  Frank Keller,et al.  Using the Web to Obtain Frequencies for Unseen Bigrams , 2003, CL.

[104]  Daniel Gildea Probabilistic Models of Verb-Argument Structure , 2002, COLING.

[105]  Kevin Knight,et al.  Generation that Exploits Corpus-Based Statistical Knowledge , 1998, ACL.

[106]  Kentaro Inui,et al.  A Survey on Paraphrase Generation and Recognition , 2004 .

[107]  Francis Bond,et al.  Extending the Coverage of a Valency Dictionary , 2002, COLING 2002.

[108]  Teruko Mitamura,et al.  The KANTOO Machine Translation Environment , 2000, AMTA.

[109]  Graeme Hirst,et al.  Building a lexical knowledge-base of near-synonym differences , 2004 .

[110]  Satoshi Sato Automatic Paraphrase of Technical Papers' Titles , 1998 .

[111]  John B. Lowe,et al.  The Berkeley FrameNet Project , 1998, ACL.

[112]  Kentaro Inui,et al.  Text Simplification for Reading Assistance: A Project Note , 2003, IWP@ACL.

[113]  Darren Pearce,et al.  Synonymy in collocation extraction , 2001 .