N-gram IDF: A Global Term Weighting Scheme Based on Information Distance
暂无分享,去创建一个
[1] Don R. Swanson,et al. Probabilistic models for automatic indexing , 1974, J. Am. Soc. Inf. Sci..
[2] Stephen P. Harter,et al. A probabilistic approach to automatic keyword indexing , 1974 .
[3] Stephen P. Harter,et al. A probabilistic approach to automatic keyword indexing. Part I. On the Distribution of Specialty Words in a Technical Literature , 1975, J. Am. Soc. Inf. Sci..
[4] Gerard Salton,et al. A vector space model for automatic indexing , 1975, CACM.
[5] Michael McGill,et al. Introduction to Modern Information Retrieval , 1983 .
[6] David Haussler,et al. Complete inverted files for efficient text retrieval and analysis , 1987, JACM.
[7] Kenneth Ward Church,et al. Word Association Norms, Mutual Information, and Lexicography , 1989, ACL.
[8] Stephen E. Robertson,et al. Okapi at TREC-3 , 1994, TREC.
[9] Ming Li,et al. An Introduction to Kolmogorov Complexity and Its Applications , 2019, Texts in Computer Science.
[10] A. Shiryayev. On Tables of Random Numbers , 1993 .
[11] Stephen E. Robertson,et al. GatfordCentre for Interactive Systems ResearchDepartment of Information , 1996 .
[12] Kenneth Ward Church,et al. Poisson mixtures , 1995, Natural Language Engineering.
[13] William I. Gasarch,et al. Book Review: An introduction to Kolmogorov Complexity and its Applications Second Edition, 1997 by Ming Li and Paul Vitanyi (Springer (Graduate Text Series)) , 1997, SIGACT News.
[14] W. R. Grei,et al. A theory of term weighting based on exploratory data analysis , 1998, SIGIR 1998.
[15] R. Landauer,et al. Irreversibility and heat generation in the computing process , 1961, IBM J. Res. Dev..
[16] Djoerd Hiemstra,et al. A probabilistic justification for using tf×idf term weighting in information retrieval , 2000, International Journal on Digital Libraries.
[17] Kishore Papineni,et al. Why Inverse Document Frequency? , 2001, NAACL.
[18] Daniel Jurafsky,et al. Is Knowledge-Free Induction of Multiword Unit Dictionary Headwords a Solved Problem? , 2001, EMNLP.
[19] Kenneth Ward Church,et al. Using Suffix Arrays to Compute Term Frequency and Document Frequency for All Substrings in a Corpus , 2001, Computational Linguistics.
[20] Benjamin C. M. Fung,et al. Hierarchical Document Clustering using Frequent Itemsets , 2003, SDM.
[21] Andrew Zisserman,et al. Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.
[22] Akiko Aizawa,et al. An information-theoretic perspective of tf-idf measures , 2003, Inf. Process. Manag..
[23] Roberto Grossi,et al. High-order entropy-compressed text indexes , 2003, SODA '03.
[24] Enno Ohlebusch,et al. Replacing suffix trees with enhanced suffix arrays , 2004, J. Discrete Algorithms.
[25] Karen Spärck Jones. A statistical interpretation of term specificity and its application in retrieval , 2021, J. Documentation.
[26] Stephen E. Robertson,et al. Understanding inverse document frequency: on theoretical arguments for IDF , 2004, J. Documentation.
[27] Constantin Orasan,et al. A Comparison of Summarisation Methods Based on Term Specificity Estimation , 2004, LREC.
[28] Paul M. B. Vitányi,et al. Clustering by compression , 2003, IEEE Transactions on Information Theory.
[29] Pavel Pecina. An Extensive Empirical Study of Collocation Extraction Methods , 2005, ACL.
[30] Tommi S. Jaakkola,et al. Using term informativeness for named entity detection , 2005, SIGIR '05.
[31] Ali R. Hurson,et al. TF-ICF: A New Term Weighting Scheme for Clustering Dynamic Data Streams , 2006, 2006 5th International Conference on Machine Learning and Applications (ICMLA'06).
[32] Paul M. B. Vitányi,et al. The Google Similarity Distance , 2004, IEEE Transactions on Knowledge and Data Engineering.
[33] Donald Metzler,et al. Generalized inverse document frequency , 2008, CIKM '08.
[34] Thomas Roelleke,et al. TF-IDF uncovered: a study of theories and probabilities , 2008, SIGIR '08.
[35] Kam-Fai Wong,et al. Interpreting TF-IDF term weights as making relevance decisions , 2008, TOIS.
[36] Jun'ichi Tsujii,et al. Text Categorization with All Substring Features , 2009, SDM.
[37] Gerlof Bouma,et al. Normalized (pointwise) mutual information in collocation extraction , 2009 .
[38] Jian Su,et al. Supervised and Traditional Term Weighting Methods for Automatic Text Categorization , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[39] Tu Bao Ho,et al. Improving effectiveness of mutual information for substantival multiword expression extraction , 2009, Expert Syst. Appl..
[40] J. Silva,et al. A Local Maxima method and a Fair Dispersion Normalization for extracting multi-word units from corpora , 2009 .
[41] Vincent Ng,et al. Conundrums in Unsupervised Keyphrase Extraction: Making Sense of the State-of-the-Art , 2010, COLING.
[42] Christopher D. Manning,et al. Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..
[43] Péter Gács,et al. Information Distance , 1998, IEEE Trans. Inf. Theory.
[44] Xiaoyan Zhu,et al. Measuring the Non-compositionality of Multiword Expressions , 2010, COLING.
[45] Rishiraj Saha Roy,et al. Unsupervised query segmentation using only query logs , 2011, WWW.
[46] Rishiraj Saha Roy,et al. An IR-based evaluation framework for web search query segmentation , 2012, SIGIR '12.
[47] T. Honkela,et al. Term Weighting in Short Documents for Document Categorization , Keyword Extraction and Query Expansion , 2012 .
[48] Gonzalo Navarro,et al. New algorithms on wavelet trees and applications to information retrieval , 2010, Theor. Comput. Sci..
[49] Michalis Vazirgiannis,et al. Graph-of-word and TW-IDF: new approach to ad hoc IR , 2013, CIKM.
[50] Hideo Bannai,et al. Efficient Computation of Substring Equivalence Classes with Suffix Arrays , 2007, Algorithmica.