Quantifying the Effects of Text Duplication on Semantic Models
暂无分享,去创建一个
Laure Thompson | David M. Mimno | Alexandra Schofield | David Mimno | Alexandra Schofield | Laure Thompson
[1] David M. Blei,et al. Bayesian Checking for Topic Models , 2011, EMNLP.
[2] Geoffrey Zweig,et al. Syntactic Clustering of the Web , 1997, Comput. Networks.
[3] Ken Lang,et al. NewsWeeder: Learning to Filter Netnews , 1995, ICML.
[4] John Lee,et al. A Computational Model of Text Reuse in Ancient Literary Texts , 2007, ACL.
[5] Yulia Tsvetkov,et al. Problems With Evaluation of Word Embeddings Using Word Similarity Tasks , 2016, RepEval@ACL.
[6] Timothy Baldwin,et al. Machine Reading Tea Leaves: Automatically Evaluating Topic Coherence and Topic Model Quality , 2014, EACL.
[7] Andrew McCallum,et al. Rethinking LDA: Why Priors Matter , 2009, NIPS.
[8] Iryna Gurevych,et al. Text Reuse Detection using a Composition of Text Similarity Measures , 2012, COLING.
[9] Benno Stein,et al. New Issues in Near-duplicate Detection , 2007, GfKl.
[10] Yorick Wilks,et al. Measuring Text Reuse , 2002, ACL.
[11] Andrew McCallum,et al. Optimizing Semantic Coherence in Topic Models , 2011, EMNLP.
[12] Daniel Barbará,et al. Topic Significance Ranking of LDA Generative Models , 2009, ECML/PKDD.
[13] Ehud Rivlin,et al. Placing search in context: the concept revisited , 2002, TOIS.
[14] Jeffrey Pennington,et al. GloVe: Global Vectors for Word Representation , 2014, EMNLP.
[15] Michael I. Jordan,et al. Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..
[16] Ruslan Salakhutdinov,et al. Evaluation methods for topic models , 2009, ICML '09.
[17] Gurmeet Singh Manku,et al. Detecting near-duplicates for web crawling , 2007, WWW '07.
[18] David A. Smith,et al. Infectious texts: Modeling text reuse in nineteenth-century newspapers , 2013, 2013 IEEE International Conference on Big Data.
[19] Felix Hill,et al. SimLex-999: Evaluating Semantic Models With (Genuine) Similarity Estimation , 2014, CL.
[20] Paul Clough,et al. Old and new challenges in automatic plagiarism detection , 2003 .
[21] Adam Lopez,et al. Proceedings of the 1st Workshop on Evaluating Vector Space Representations for NLP , 2016 .
[22] Gemma Boleda,et al. Distributional Semantics in Technicolor , 2012, ACL.
[23] Paul Ginsparg,et al. Patterns of text reuse in a scientific corpus , 2014, Proceedings of the National Academy of Sciences.
[24] 悠太 菊池,et al. 大規模要約資源としてのNew York Times Annotated Corpus , 2015 .
[25] Timothy Baldwin,et al. Automatic Evaluation of Topic Coherence , 2010, NAACL.
[26] Mark Stevenson,et al. Evaluating Topic Coherence Using Distributional Semantics , 2013, IWCS.
[27] T. Landauer,et al. Indexing by Latent Semantic Analysis , 1990 .
[28] Chong Wang,et al. Reading Tea Leaves: How Humans Interpret Topic Models , 2009, NIPS.
[29] Sanjeev Arora,et al. A Practical Algorithm for Topic Modeling with Provable Guarantees , 2012, ICML.
[30] Richard A. Harshman,et al. Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..
[31] Jeffrey Dean,et al. Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.