Bayesian text analytics for document collections
暂无分享,去创建一个
[1] Inderjit S. Dhillon,et al. Iterative clustering of high dimensional text data augmented by local search , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..
[2] Elaine Toms,et al. The effect of speech recognition accuracy rates on the usefulness and usability of webcast archives , 2006, CHI.
[3] Michael I. Jordan,et al. Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..
[4] Shourya Roy,et al. How Much Noise Is Too Much: A Study in Automatic Text Classification , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).
[5] Sean Gerrish,et al. Predicting Legislative Roll Calls from Text , 2011, ICML.
[6] M. Stephens. Dealing with label switching in mixture models , 2000 .
[7] Wei Li,et al. Pachinko allocation: DAG-structured mixture models of topic correlations , 2006, ICML.
[8] L. Hubert,et al. Comparing partitions , 1985 .
[9] Christian P. Robert,et al. The Bayesian choice : from decision-theoretic foundations to computational implementation , 2007 .
[10] D. Aldous. Exchangeability and related topics , 1985 .
[11] Chong Wang,et al. Reading Tea Leaves: How Humans Interpret Topic Models , 2009, NIPS.
[12] Eric K. Ringger,et al. Topics Over Nonparametric Time: A Supervised Topic Model Using Bayesian Nonparametric Density Estimation , 2012, BMA.
[13] Dan Klein,et al. Unsupervised Coreference Resolution in a Nonparametric Bayesian Model , 2007, ACL.
[14] David M. Blei,et al. Uncovering, understanding, and predicting links , 2011 .
[15] Radford M. Neal. Markov Chain Sampling Methods for Dirichlet Process Mixture Models , 2000 .
[16] Richard A. Harshman,et al. Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..
[17] Daniel P. Lopresti. Optical character recognition errors and their effects on natural language processing , 2009, International Journal on Document Analysis and Recognition (IJDAR).
[18] Thomas L. Griffiths,et al. Hierarchical Topic Models and the Nested Chinese Restaurant Process , 2003, NIPS.
[19] Charles Elkan,et al. Accounting for burstiness in topic models , 2009, ICML '09.
[20] Mark Steyvers,et al. Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.
[21] Ali A. Ghorbani,et al. An Iterative Hybrid Filter-Wrapper Approach to Feature Selection for Document Clustering , 2009, Canadian Conference on AI.
[22] Andrew McCallum,et al. A comparison of event models for naive bayes text classification , 1998, AAAI 1998.
[23] Kazem Taghva,et al. Results of applying probabilistic IR to OCR text , 1994, SIGIR '94.
[24] Edward Y. Chang,et al. PLDA: Parallel Latent Dirichlet Allocation for Large-Scale Applications , 2009, AAIM.
[25] Yuchou Chang,et al. Unsupervised feature selection using clustering ensembles and population based incremental learning algorithm , 2008, Pattern Recognit..
[26] Marina Meila,et al. An Experimental Comparison of Model-Based Clustering Methods , 2004, Machine Learning.
[27] Andrew McCallum,et al. Topic Models Conditioned on Arbitrary Features with Dirichlet-multinomial Regression , 2008, UAI.
[28] Thorsten Joachims,et al. A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization , 1997, ICML.
[29] Thomas Hofmann,et al. Probabilistic latent semantic indexing , 1999, SIGIR '99.
[30] Yee Whye Teh,et al. A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation , 2006, NIPS.
[31] Kazem Taghva,et al. Evaluating text categorization in the presence of OCR errors , 2000, IS&T/SPIE Electronic Imaging.
[32] D. Blackwell,et al. Ferguson Distributions Via Polya Urn Schemes , 1973 .
[33] Max Welling,et al. Fast collapsed gibbs sampling for latent dirichlet allocation , 2008, KDD.
[34] Horst Bunke,et al. Recognition of cursive Roman handwriting: past, present and future , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..
[35] John D. Lafferty,et al. Dynamic topic models , 2006, ICML.
[36] Byron Dom,et al. An Information-Theoretic External Cluster-Validity Measure , 2002, UAI.
[37] Max Welling,et al. Asynchronous Distributed Learning of Topic Models , 2008, NIPS.
[38] Andrew McCallum,et al. Rethinking LDA: Why Priors Matter , 2009, NIPS.
[39] Bo Pang,et al. Seeing Stars: Exploiting Class Relationships for Sentiment Categorization with Respect to Rating Scales , 2005, ACL.
[40] Yiming Yang,et al. A Probabilistic Model for Online Document Clustering with Application to Novelty Detection , 2004, NIPS.
[41] Joydeep Ghosh,et al. Under Consideration for Publication in Knowledge and Information Systems Generative Model-based Document Clustering: a Comparative Study , 2003 .
[42] Andrew McCallum,et al. Organizing the OCA: learning faceted subjects from a library of digital books , 2007, JCDL '07.
[43] Shipeng Yu,et al. Advanced probabilistic models for clustering and projection , 2006 .
[44] Francis R. Bach,et al. Online Learning for Latent Dirichlet Allocation , 2010, NIPS.
[45] Eric K. Ringger,et al. A synthetic document image dataset for developing and evaluating historical document processing methods , 2011, Electronic Imaging.
[46] Venu Govindaraju,et al. Using topic models for OCR correction , 2009, International Journal on Document Analysis and Recognition (IJDAR).
[47] Ata Kabán,et al. On an equivalence between PLSI and LDA , 2003, SIGIR.
[48] David M. Blei,et al. Supervised Topic Models , 2007, NIPS.
[49] Evangelos E. Milios,et al. Latent Dirichlet Co-Clustering , 2006, Sixth International Conference on Data Mining (ICDM'06).
[50] Eric K. Ringger,et al. Evaluating Models of Latent Document Semantics in the Presence of OCR Errors , 2010, EMNLP.
[51] Michael L. Wick,et al. Context-Sensitive Error Correction: Using Topic Models to Improve OCR , 2007, Ninth International Conference on Document Analysis and Recognition (ICDAR 2007).
[52] Michael,et al. On a Class of Bayesian Nonparametric Estimates : I . Density Estimates , 2008 .
[53] M. Escobar. Estimating Normal Means with a Dirichlet Process Prior , 1994 .
[54] Karl Pearson F.R.S.. LIII. On lines and planes of closest fit to systems of points in space , 1901 .
[55] Ruslan Salakhutdinov,et al. Evaluation methods for topic models , 2009, ICML '09.
[56] David B. Dunson,et al. Bayesian Data Analysis , 2010 .
[57] Henry S. Baird,et al. The State of the Art of Document Image Degradation Modelling , 2007 .
[58] T. Ferguson. A Bayesian Analysis of Some Nonparametric Problems , 1973 .
[59] Wei-Ying Ma,et al. An Evaluation on Feature Selection for Text Clustering , 2003, ICML.
[60] David M. Blei,et al. Relational Topic Models for Document Networks , 2009, AISTATS.
[61] Yiming Yang,et al. A Comparative Study on Feature Selection in Text Categorization , 1997, ICML.
[62] Thomas L. Griffiths,et al. Integrating Topics and Syntax , 2004, NIPS.
[63] Thomas L. Griffiths,et al. A fully Bayesian approach to unsupervised part-of-speech tagging , 2007, ACL.
[64] Eric K. Ringger,et al. Evaluating supervised topic models in the presence of OCR errors , 2013, Electronic Imaging.
[65] Nasser M. Nasrabadi,et al. Pattern Recognition and Machine Learning , 2006, Technometrics.
[66] David J. Newman,et al. Probabilistic topic decomposition of an eighteenth-century American newspaper , 2006, J. Assoc. Inf. Sci. Technol..
[67] Eric K. Ringger,et al. Improving optical character recognition through efficient multiple system alignment , 2009, JCDL '09.
[68] Thomas L. Griffiths,et al. The Author-Topic Model for Authors and Documents , 2004, UAI.
[69] H. Raiffa,et al. Introduction to Statistical Decision Theory , 1996 .
[70] Michael A. West,et al. Computing Nonparametric Hierarchical Models , 1998 .
[71] Ken Lang,et al. NewsWeeder: Learning to Filter Netnews , 1995, ICML.
[72] Jason Baldridge,et al. Supervised Text-based Geolocation Using Language Models on an Adaptive Grid , 2012, EMNLP.
[73] Eric K. Ringger,et al. Model-based document clustering with a collapsed gibbs sampler , 2008, KDD.
[74] Charles Nicholas,et al. Feature Selection and Document Clustering , 2004 .
[75] Arindam Banerjee,et al. Topic Models over Text Streams: A Study of Batch and Online Unsupervised Learning , 2007, SDM.
[76] A. McCallum,et al. Topical N-Grams: Phrase and Topic Discovery, with an Application to Information Retrieval , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).
[77] M. Meilă. Comparing clusterings---an information based distance , 2007 .
[78] Eric K. Ringger,et al. Progressive Alignment and Discriminative Error Correction for Multiple OCR Engines , 2011, 2011 International Conference on Document Analysis and Recognition.
[79] Henry S. Baird,et al. Document image defect models , 1995 .
[80] W. Michael Conklin,et al. Monte Carlo Methods in Bayesian Computation , 2001, Technometrics.
[81] Eric P. Xing,et al. A Nonparametric Mixture Model for Topic Modeling over Time , 2012, SDM.
[82] Yee Whye Teh,et al. On Smoothing and Inference for Topic Models , 2009, UAI.
[83] Daniel P. Lopresti. Performance evaluation for text processing of noisy inputs , 2005, SAC '05.
[84] Eric C. Jensen,et al. A Survey of Retrieval Strategies for OCR Text Collections , 2002 .
[85] Andrew McCallum,et al. Using Maximum Entropy for Text Classification , 1999 .
[86] Julia Hirschberg,et al. V-Measure: A Conditional Entropy-Based External Cluster Evaluation Measure , 2007, EMNLP.
[87] Naonori Ueda,et al. Deterministic annealing EM algorithm , 1998, Neural Networks.
[88] David D. Lewis,et al. Reuters-21578 Text Categorization Test Collection, Distribution 1.0 , 1997 .
[89] P. Green,et al. Corrigendum: On Bayesian analysis of mixtures with an unknown number of components , 1997 .
[90] Xiaohu Zhang,et al. Training on severely degraded text-line images , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..
[91] Andrew McCallum,et al. Topics over time: a non-Markov continuous-time model of topical trends , 2006, KDD '06.
[92] George Karypis,et al. A Comparison of Document Clustering Techniques , 2000 .