Measuring the Limit of Semantic Divergence for English Tweets

In human language, an expression could be conveyed in many ways by different people. Even that the same person may express same sentence quite differently when addressing different audiences, using different modalities, or using different syntactic variations or may use different set of vocabulary. The possibility of such endless surface form of text while the meaning of the text remains almost same, poses many challenges for Natural Language Processing (NLP) systems like question-answering system, machine translation system and text summarization. This research paper is an endeavor to understand the characteristic of such endless semantic divergence. In this research work we develop a corpus of 1525 semantic divergent sentences for 200 English tweets.

[1]  Tomoki Toda,et al.  Simple, lexicalized choice of translation timing for simultaneous speech translation , 2013, INTERSPEECH.

[2]  Philip Resnik,et al.  Using Information Content to Evaluate Semantic Similarity in a Taxonomy , 1995, IJCAI.

[3]  Christopher D. Manning,et al.  A Simple and Effective Hierarchical Phrase Reordering Model , 2008, EMNLP.

[4]  Yi Liu,et al.  Query Rewriting Using Monolingual Statistical Machine Translation , 2010, CL.

[5]  Frank O'Hare,et al.  Sentence Combining: Improving Student Writing without Formal Grammar Instruction. NCTE Committee on Research Report Series. No. 15. , 1973 .

[6]  J. Chambers,et al.  The handbook of language variation and change , 2003 .

[7]  K. Lambrecht A framework for the analysis of cleft constructions , 2001 .

[8]  Vasile Rus,et al.  Similarity Measures Based on Latent Dirichlet Allocation , 2013, CICLing.

[9]  Forbes Ave. Pittsburgh Automatic Rewriting for Controlled Language Translation , 2001 .

[10]  Jussi Karlgren,et al.  Stylistic Variation in an Information Retrieval Experiment , 1996, ArXiv.

[11]  R. Likert “Technique for the Measurement of Attitudes, A” , 2022, The SAGE Encyclopedia of Research Design.

[12]  D. Gorfein Resolving Semantic Ambiguity , 1989, Cognitive Science.

[13]  Jérôme Euzenat,et al.  A Feature and Information Theoretic Framework for Semantic Similarity and Relatedness , 2010, SEMWEB.

[14]  Andreea S. Calude DEMONSTRATIVE CLEFTS AND DOUBLE CLEFT CONSTRUCTIONS IN SPONTANEOUS SPOKEN ENGLISH , 2008 .

[15]  Vasile Rus,et al.  SEMILAR: The Semantic Similarity Toolkit , 2013, ACL.

[16]  Nitin Madnani,et al.  Generating Phrasal and Sentential Paraphrases: A Survey of Data-Driven Methods , 2010, CL.

[17]  A. Bell Language style as audience design , 1984, Language in Society.

[18]  Shibamouli Lahiri,et al.  Informality Judgment at Sentence Level and Experiments with Formality Score , 2011, CICLing.

[19]  Jean-Marc Dewaele,et al.  Formality of Language: definition, measurement and behavioral determinants , 1999 .

[20]  Siobhan Chapman Logic and Conversation , 2005 .

[21]  Hideki Hirakawa,et al.  An Interactive Rewriting Tool for Machine Acceptable Sentences , 1994, ANLP.

[22]  He He,et al.  Syntax-based Rewriting for Simultaneous Machine Translation , 2015, EMNLP.

[23]  John C. Mellon Transformational Sentence Combining: A Method for Enhancing the Development of Syntactic Fluency in English Composition , 1969 .

[24]  Roberto Navigli,et al.  Align, Disambiguate and Walk: A Unified Approach for Measuring Semantic Similarity , 2013, ACL.