Analyzing and Improving the Quality of a Historical News Collection using Language Technology and Statistical Machine Learning Methods
暂无分享,去创建一个
[1] Klaus U. Schulz,et al. A visual and interactive tool for optimizing lexical postcorrection of OCR results , 2003, 2003 Conference on Computer Vision and Pattern Recognition Workshop.
[2] Tommi A. Pirinen,et al. HFST Tools for Morphology - An Efficient Open-Source Package for Construction of Morphological Analyzers , 2009, SFCM.
[3] Zeeshan Bhatti,et al. Phonetic based SoundEx & ShapeEx algorithm for Sindhi Spell Checker System , 2014, ArXiv.
[4] Mathias Creutz,et al. Unsupervised models for morpheme segmentation and morphology learning , 2007, TSLP.
[5] O. J. Vrieze,et al. Kohonen Network , 1995, Artificial Neural Networks.
[6] Gregory R. Crane,et al. The challenge of virginia banks: an evaluation of named entity analysis in a 19th-century newspaper collection , 2006, Proceedings of the 6th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL '06).
[7] Edwin R. Hancock,et al. Discovering Shape Classes using Tree Edit-Distance and Pairwise Clustering , 2007, International Journal of Computer Vision.
[8] Ismo Raitanen. "Etsikäät hywää ja älläät pahaa." Tiedonhakumenetelmien tuloksellisuuden vertailu merkkivirheitä sisältävässä historiallisessa sanomalehtikokoelmassa , 2012 .
[9] Daniel X. Le,et al. Pattern matching techniques for correcting low-confidence OCR words in a known context , 2000, IS&T/SPIE Electronic Imaging.
[10] Alon Lavie,et al. Meteor Universal: Language Specific Translation Evaluation for Any Target Language , 2014, WMT@ACL.
[11] Hartmut Walravens. A NORDIC DIGITAL NEWSPAPER LIBRARY , 2006 .
[12] Klaus U. Schulz,et al. On lexical resources for digitization of historical documents , 2009, DocEng '09.
[13] Jilei Tian,et al. n-gram and decision tree based language identification for written words , 2001, IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01..
[14] Leonid Boytsov,et al. Indexing methods for approximate dictionary searching: Comparative analysis , 2011, JEAL.
[15] Simon Tanner,et al. Measuring Mass Text Digitization Quality and Usefulness , 2009 .
[16] Karen Kukich,et al. Techniques for automatically correcting words in text , 1992, CSUR.
[17] Michael I. Jordan,et al. Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..
[18] Otto Chrons,et al. Digitalkoot: Making Old Archives Accessible Using Crowdsourcing , 2011, Human Computation.
[19] Amanda Spink,et al. Real life, real users, and real needs: a study and analysis of user queries on the web , 2000, Inf. Process. Manag..
[20] Joseph P. Turian,et al. Evaluation of machine translation and its evaluation , 2003, MTSUMMIT.
[21] Timo Honkela,et al. A Language-Independent Approach to Keyphrase Extraction and Evaluation , 2008, COLING.
[22] Jaakko J. Väyrynen,et al. WordICA—emergence of linguistic representations for words by independent component analysis , 2010, Natural Language Engineering.
[23] Kimmo Kettunen,et al. Can Type-Token Ratio be Used to Show Morphological Complexity of Languages?* , 2014, J. Quant. Linguistics.
[24] M. V. Velzen,et al. Self-organizing maps , 2007 .
[25] Simon Tanner,et al. Measuring Mass Text Digitization Quality and Usefulness: Lessons Learned from Assessing the OCR Accuracy of the British Library's 19th Century Online Newspaper Archive , 2009, D Lib Mag..
[26] Tommi Vatanen,et al. Language Identification of Short Text Segments with N-gram Models , 2010, LREC.
[27] Klaus U. Schulz,et al. Adaptive text correction with Web-crawled domain-dependent dictionaries , 2007, TSLP.
[28] Hartmut Walravens. CONNECTING TO THE PAST – NEWSPAPER DIGITISATION IN THE NORDIC COUNTRIES , 2006 .