论文信息 - Challenging the Boundaries of Unsupervised Learning for Semantic Similarity - 字舞流文

Challenging the Boundaries of Unsupervised Learning for Semantic Similarity

The semantic analysis field has a crucial role to play in the research related to text analytics. Calculating the semantic similarity between sentences is a long-standing problem in the area of natural language processing, and it differs significantly as the domain of operation differs. In this paper, we present a methodology that can be applied across multiple domains by incorporating corpora-based statistics into a standardized semantic similarity algorithm. To calculate the semantic similarity between words and sentences, the proposed method follows an edge-based approach using a lexical database. When tested on both benchmark standards and mean human similarity dataset, the methodology achieves a high correlation value for both word (<inline-formula> <tex-math notation="LaTeX">$r=0.8753$ </tex-math></inline-formula>) and sentence similarity (<inline-formula> <tex-math notation="LaTeX">$r=0.8793$ </tex-math></inline-formula>) concerning <italic>Rubenstein and Goodenough</italic> standard and the <italic>SICK</italic> dataset (<inline-formula> <tex-math notation="LaTeX">$r=0.8324 $ </tex-math></inline-formula><xref ref-type="fn" rid="fn1"><sup>1</sup></xref>) outperforming other unsupervised models.<fn id="fn1"><label><sup>1</sup></label><p>Eliminating the outliers which constitutes to 3.75% of 4927 statement pairs</p></fn>

Vijay K. Mago | Atish Pawar | Vijay Mago | Atish Pawar

[1] Jeffrey Dean,et al. Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[2] Euripides G. M. Petrakis,et al. Semantic similarity methods in wordNet and their application to information retrieval on the web , 2005, WIDM '05.

[3] Alice Lai,et al. Illinois-LH: A Denotational and Distributional Approach to Semantics , 2014, *SEMEVAL.

[4] Christopher D. Manning,et al. Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks , 2015, ACL.

[5] Jimmy J. Lin,et al. Multi-Perspective Sentence Similarity Modeling with Convolutional Neural Networks , 2015, EMNLP.

[6] Beatrice Santorini,et al. Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[7] Carole A. Goble,et al. Investigating Semantic Similarity Measures Across the Gene Ontology: The Relationship Between Sequence and Annotation , 2003, Bioinform..

[8] Philip Resnik,et al. Semantic Similarity in a Taxonomy: An Information-Based Measure and its Application to Problems of Ambiguity in Natural Language , 1999, J. Artif. Intell. Res..

[9] Jinwoo Park,et al. Improving text categorization using the importance of sentences , 2004, Inf. Process. Manag..

[10] Mitsuru Ishizuka,et al. Keyword extraction from a single document using word co-occurrence statistical information , 2004, Int. J. Artif. Intell. Tools.

[11] Phillip W. Lord,et al. Semantic Similarity in Biomedical Ontologies , 2009, PLoS Comput. Biol..

[12] Vibhanshu Abhishek,et al. Keyword generation for search engine advertising using semantic similarity between terms , 2007, ICEC.

[13] Jonas Mueller,et al. Siamese Recurrent Architectures for Learning Sentence Similarity , 2016, AAAI.

[14] John B. Goodenough,et al. Contextual correlates of synonymy , 1965, CACM.

[15] Danushka Bollegala,et al. Measuring semantic similarity between words using web search engines , 2007, WWW '07.

[16] John Sinclair,et al. Looking up : an account of the COBUILD Project in lexical computing and the development of the Collins COBUILD English Language Dictionary , 1987 .

[17] Zuhair Bandar,et al. Sentence similarity based on semantic nets and corpus statistics , 2006, IEEE Transactions on Knowledge and Data Engineering.

[18] Diana Inkpen,et al. Semantic text similarity using corpus-based word similarity and string similarity , 2008, ACM Trans. Knowl. Discov. Data.

[19] Paul M. B. Vitányi,et al. The Google Similarity Distance , 2004, IEEE Transactions on Knowledge and Data Engineering.

[20] Dragomir R. Radev,et al. LexRank: Graph-based Lexical Centrality as Salience in Text Summarization , 2004, J. Artif. Intell. Res..

[21] A D Baddeley,et al. Short-term Memory for Word Sequences as a Function of Acoustic, Semantic and Formal Similarity , 1966, The Quarterly journal of experimental psychology.

[22] Mihai Surdeanu,et al. The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.

[23] Liang Xiao,et al. Wider and Deeper, Cheaper and Faster: Tensorized LSTMs for Sequence Learning , 2017, NIPS.

[24] George A. Miller,et al. WordNet: A Lexical Database for English , 1995, HLT.

[25] Malvina Nissim,et al. The Meaning Factory: Formal Semantics for Recognizing Textual Entailment and Determining Semantic Similarity , 2014, *SEMEVAL.

[26] G. Miller,et al. Contextual correlates of semantic similarity , 1991 .

[27] Man Lan,et al. ECNU: One Stone Two Birds: Ensemble of Heterogenous Measures for Semantic Relatedness and Textual Entailment , 2014, *SEMEVAL.

[28] Alexander F. Gelbukh,et al. UNAL-NLP: Combining Soft Cardinality Features for Semantic Textual Similarity, Relatedness and Entailment , 2014, *SEMEVAL.

[29] Dekang Lin,et al. An Information-Theoretic Definition of Similarity , 1998, ICML.

[30] Steven Bird,et al. NLTK: The Natural Language Toolkit , 2002, ACL.

[31] Marco Marelli,et al. A SICK cure for the evaluation of compositional distributional semantic models , 2014, LREC.

[32] Ted Pedersen,et al. Measures of semantic similarity and relatedness in the biomedical domain , 2007, J. Biomed. Informatics.

[33] David W. Conrath,et al. Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy , 1997, ROCLING/IJCLCLP.

[34] Ted Pedersen,et al. Maximizing Semantic Relatedness to Perform Word Sense Disambiguation , 2005 .

[35] Charles T. Meadow,et al. Text information retrieval systems , 1992 .

[36] Seán O'Riain,et al. Querying Linked Data Using Semantic Relatedness: A Vocabulary Independent Approach , 2011, NLDB.

[37] Jia Wei Chang,et al. A Grammar-Based Semantic Similarity Algorithm for Natural Language Sentences , 2014, TheScientificWorldJournal.