A universal information theoretic approach to the identification of stopwords
暂无分享,去创建一个
Martin Gerlach | Hanyu Shi | Luís A. Nunes Amaral | M. Gerlach | L. A. N. Amaral | Hanyu Shi | Martin Gerlach
[1] Jeffrey Dean,et al. Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.
[2] Michael I. Jordan,et al. Combinatorial Clustering and the Beta Negative Binomial Process , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[3] Xiaotie Deng,et al. Automatic construction of Chinese stop word list , 2006 .
[4] D. Rebholz-Schuhmann,et al. Text-mining solutions for biomedical research: enabling integrative biology , 2012, Nature Reviews Genetics.
[5] David M. Mimno,et al. Applications of Topic Models , 2017, Found. Trends Inf. Retr..
[6] Gerard Salton,et al. On the Specification of Term Values in Automatic Indexing , 1973 .
[7] S. Gries. Dispersions and adjusted frequencies in corpora , 2008 .
[8] Serkan Günal,et al. The impact of preprocessing on text classification , 2014, Inf. Process. Manag..
[9] A. Karr. Exploratory Data Mining and Data Cleaning , 2006 .
[10] Leto Peel,et al. The ground truth about metadata and community detection in networks , 2016, Science Advances.
[11] George Kingsley Zipf,et al. Human behavior and the principle of least effort , 1949 .
[12] Iadh Ounis,et al. Automatically Building a Stopword List for an Information Retrieval System , 2005, J. Digit. Inf. Manag..
[13] Konrad P. Körding,et al. A high-reproducibility and high-accuracy method for automated topic classification , 2014, ArXiv.
[14] Sunil Arya,et al. Space-time tradeoffs for approximate nearest neighbor searching , 2009, JACM.
[15] Konrad P. Körding,et al. Science Concierge: A Fast Content-Based Recommendation System for Scientific Publications , 2016, PloS one.
[16] James A. Evans,et al. Machine Translation: Mining Text for Social Theory , 2016 .
[17] David M. Blei,et al. Supervised Topic Models , 2007, NIPS.
[18] Michael I. Jordan,et al. Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..
[19] Stein Aerts,et al. cisTopic: cis-regulatory topic modeling on single-cell ATAC-seq data , 2019, Nature Methods.
[20] Boxi Kang,et al. Landscape of Infiltrating T Cells in Liver Cancer Revealed by Single-Cell Sequencing , 2017, Cell.
[21] Kevin D. Seppi,et al. Preprocessor Selection for Machine Learning Pipelines , 2018, ArXiv.
[22] Francis R. Bach,et al. Online Learning for Latent Dirichlet Allocation , 2010, NIPS.
[23] David M. Mimno,et al. Comparing Apples to Apple: The Effects of Stemmers on Topic Models , 2016, TACL.
[24] Charu C. Aggarwal,et al. A Survey of Text Clustering Algorithms , 2012, Mining Text Data.
[25] Chong Wang,et al. Online Variational Inference for the Hierarchical Dirichlet Process , 2011, AISTATS.
[26] David B. Dunson,et al. Probabilistic topic models , 2012, Commun. ACM.
[27] L. Foster,et al. Evaluating measures of association for single-cell transcriptomics , 2019, Nature Methods.
[28] Doug Downey,et al. A new evaluation framework for topic modeling algorithms based on synthetic corpora , 2019, AISTATS.
[29] Marcelo A. Montemurro,et al. Towards the Quantification of the Semantic Information Encoded in Written Language , 2009, Adv. Complex Syst..
[30] Hans Peter Luhn,et al. The Automatic Creation of Literature Abstracts , 1958, IBM J. Res. Dev..
[31] Thomas L. Griffiths,et al. The nested chinese restaurant process and bayesian nonparametric inference of topic hierarchies , 2007, JACM.
[32] Frank Lyko,et al. Single-cell transcriptomes of the aging human skin reveal loss of fibroblast priming , 2019, bioRxiv.
[33] Santo Fortunato,et al. Weight Thresholding on Complex Networks , 2018, Physical Review E.
[34] Måns Magnusson,et al. Pulling Out the Stops: Rethinking Stopword Removal for Topic Models , 2017, EACL.
[35] P. Donnelly,et al. Inference of population structure using multilocus genotype data. , 2000, Genetics.
[36] Joel Nothman,et al. Stop Word Lists in Free Open-source Software Packages , 2018 .