Documents as a Bag of Maximal Substrings - An Unsupervised Feature Extraction for Document Clustering
暂无分享,去创建一个
[1] Jun'ichi Tsujii,et al. Text Categorization with All Substring Features , 2009, SDM.
[2] Kevin Kok Wai Wong,et al. A SOM-Based Document Clustering Using Frequent Max Substrings for Non-Segmented Texts , 2010, J. Intell. Learn. Syst. Appl..
[3] Yee Whye Teh,et al. A Hierarchical Bayesian Language Model Based On Pitman-Yor Processes , 2006, ACL.
[4] Enno Ohlebusch,et al. Optimal Exact Strring Matching Based on Suffix Arrays , 2002, SPIRE.
[5] Hoifung Poon,et al. Unsupervised Morphological Segmentation with Log-Linear Models , 2009, NAACL.
[6] David Kauchak,et al. Modeling word burstiness using the Dirichlet distribution , 2005, ICML.
[7] Michael I. Jordan,et al. Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..
[8] Naonori Ueda,et al. Bayesian Unsupervised Word Segmentation with Nested Pitman-Yor Language Modeling , 2009, ACL.
[9] T. Minka. Estimating a Dirichlet distribution , 2012 .
[10] Hiroki Arimura,et al. Linear-Time Longest-Common-Prefix Computation in Suffix Arrays and Its Applications , 2001, CPM.
[11] Soon Myoung Chung,et al. Text document clustering based on frequent word meaning sequences , 2008, Data Knowl. Eng..
[12] Maosong Sun,et al. Word Segmentation Standard in Chinese, Japanese and Korean , 2009, ALR7@IJCNLP.
[13] Sophia Ananiadou,et al. Stochastic Gradient Descent Training for L1-regularized Log-linear Models with Cumulative Penalty , 2009, ACL.
[14] Dell Zhang,et al. Extracting key-substring-group features for text classification , 2006, KDD '06.
[15] Andrew McCallum,et al. An Introduction to Conditional Random Fields for Relational Learning , 2007 .
[16] Gonzalo Navarro,et al. Compressed full-text indexes , 2007, CSUR.
[17] Daniel Jurafsky,et al. A Conditional Random Field Word Segmenter for Sighan Bakeoff 2005 , 2005, IJCNLP.
[18] Sebastian Thrun,et al. Text Classification from Labeled and Unlabeled Documents using EM , 2000, Machine Learning.
[19] Sen Zhang,et al. Two Efficient Algorithms for Linear Time Suffix Array Construction , 2011, IEEE Transactions on Computers.
[20] Andrew McCallum,et al. Topics over time: a non-Markov continuous-time model of topical trends , 2006, KDD '06.
[21] Xin Chen,et al. Probabilistic topic modeling for genomic data interpretation , 2010, 2010 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).
[22] Dell Zhang,et al. Semantic, Hierarchical, Online Clustering of Web Search Results , 2004, APWeb.