Using two-stage conditional word frequency models to model word burstiness and motivating TF-IDF
暂无分享,去创建一个
[1] Charles Elkan,et al. Clustering documents with an exponential-family approximation of the Dirichlet compound multinomial distribution , 2006, ICML.
[2] David Kauchak,et al. Modeling word burstiness using the Dirichlet distribution , 2005, ICML.
[3] M. Newman. Power laws, Pareto distributions and Zipf's law , 2005 .
[4] Djoerd Hiemstra,et al. A Linguistically Motivated Probabilistic Model of Information Retrieval , 1998, ECDL.
[5] Stephen E. Robertson,et al. Understanding inverse document frequency: on theoretical arguments for IDF , 2004, J. Documentation.
[6] H. Simon,et al. ON A CLASS OF SKEW DISTRIBUTION FUNCTIONS , 1955 .
[7] Renato De Mori,et al. A Cache-Based Natural Language Model for Speech Recognition , 1990, IEEE Trans. Pattern Anal. Mach. Intell..
[8] Thomas L. Griffiths,et al. Interpolating between types and tokens by estimating power-law generators , 2005, NIPS.
[9] David B. Lindenmayer,et al. MODELING COUNT DATA OF RARE SPECIES: SOME STATISTICAL ISSUES , 2005 .
[10] Charles Elkan,et al. Deriving TF-IDF as a Fisher Kernel , 2005, SPIRE.
[11] F. Chung,et al. Generalizations of Polya's urn Problem , 2003 .
[12] P. Manley,et al. The Multiple Species Inventory and Monitoring Protocol: A Population, Community, and Biodiversity Monitoring Solution for National Forest System Lands , 2006 .
[13] C. Wagner. Commuting Probability Revisions: The Uniformity Rule , 2003 .
[14] Dennis Day,et al. The multivariate Polya distribution in combat modeling , 2001 .
[15] Kenneth Ward Church. Empirical Estimates of Adaptation: The chance of Two Noriegas is closer to p/2 than p2 , 2000, COLING.
[16] Karen Spärck Jones. A statistical interpretation of term specificity and its application in retrieval , 2021, J. Documentation.