Quantifying the impact of dirty OCR on historical text analysis: Eighteenth Century Collections Online as a case study
暂无分享,去创建一个
[1] Kenneth E. Shirley,et al. LDAvis: A method for visualizing and interpreting topics , 2014 .
[2] Maciej Eder,et al. Mind your corpus: systematic errors in authorship attribution , 2013, Lit. Linguistic Comput..
[3] Maciej Eder,et al. Does size matter? Authorship attribution, small samples, big problem , 2015, Digit. Scholarsh. Humanit..
[4] D. Biber. Methodological Issues Regarding Corpus-based Analyses of Linguistic Variation , 1990 .
[5] Daniel McNamara,et al. Mining for the Meanings of a Murder: The Impact of OCR Quality on the Use of Digitized Historical Newspapers , 2014, Digit. Humanit. Q..
[6] Stan Lipovetsky. Lexical Collocation Analysis: Advances and Applications , 2020, Technometrics.
[7] Beatrice Alex,et al. Digitised historical text: Does it have to be mediOCRe? , 2012, KONVENS.
[8] Philip M. McCarthy,et al. MTLD, vocd-D, and HD-D: A validation study of sophisticated approaches to lexical diversity assessment , 2010, Behavior research methods.
[9] Arthur Spirling,et al. Text Preprocessing For Unsupervised Learning: Why It Matters, When It Misleads, And What To Do About It , 2017, Political Analysis.
[10] Paddy Bullard,et al. Digital Humanities and Electronic Resources in the Long Eighteenth Century , 2013 .
[11] Trevor Hastie,et al. Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.
[12] John Burrows,et al. All the Way Through: Testing for Authorship in Different Frequency Strata , 2007, Lit. Linguistic Comput..
[13] David Mimno,et al. Evaluating the Stability of Embedding-based Word Similarities , 2018, TACL.
[14] David M. Mimno,et al. Comparing Apples to Apple: The Effects of Stemmers on Topic Models , 2016, TACL.
[15] Rose Holley,et al. How Good Can It Get? Analysing and Improving OCR Accuracy in Large Scale Historic Newspaper Digitisation Programs , 2009, D Lib Mag..
[16] Anke Lüdeling,et al. Corpus Linguistics: An International Handbook , 2009 .
[17] Mark Johnson,et al. Unsupervised learning of multi-word verbs , 2001 .
[18] Douglas Biber,et al. Representativeness in corpus design , 1993 .
[19] P. Spedding. "The New Machine": Discovering the Limits of ECCO , 2011 .
[20] Stefan Evert,et al. Collocation Candidate Extraction from Dependency-Annotated Corpora: Exploring Differences across Parsers and Dependency Annotation Schemes , 2018 .
[21] Mike Kestemont,et al. Stylometry with R: A Package for Computational Text Analysis , 2016, R J..
[22] Tony McEnery,et al. Collocations in context:a new perspective on collocation networks , 2015 .
[23] Klaus U. Schulz,et al. PoCoTo - an open source system for efficient interactive postcorrection of OCRed historical texts , 2014, DATeCH '14.
[24] R. Harald Baayen,et al. How Variable May a Constant be? Measures of Lexical Richness in Perspective , 1998, Comput. Humanit..
[25] Tony McEnery,et al. Collocations in Corpus‐Based Language Learning Research: Identifying, Comparing, and Interpreting the Evidence , 2017 .
[26] Greta Franzini,et al. Attributing Authorship in the Noisy Digitized Correspondence of Jacob and Wilhelm Grimm , 2018, Front. Digit. Humanit..
[27] Michael Piotrowski,et al. Natural Language Processing for Historical Texts , 2012, Synthesis Lectures on Human Language Technologies.
[28] Peter de Bolla. The Architecture of Concepts: The Historical Formation of Human Rights , 2013 .
[29] Isabelle Boydens. Informatique, normes et temps , 1999 .