Information access in the presence of OCR errors

Over the last 15 years, the Information Science Research Institute (ISRI) at the University of Nevada, Las Vegas (UNLV) has conducted information access research in the presence of OCR errors. Our research has focused on issues associated with the construction of large document databases. In this paper, we will highlight our findings and detail our current activities.

[1]  Julie Borsack,et al.  Expert system for automatically correcting OCR output , 1994, Electronic Imaging.

[2]  Ross Wilkinson,et al.  Effective retrieval of structured documents , 1994, SIGIR '94.

[3]  James P. Callan,et al.  Passage-level evidence in document retrieval , 1994, SIGIR '94.

[4]  Gerard Salton,et al.  Research and Development in Information Retrieval , 1982, Lecture Notes in Computer Science.

[5]  Kazem Taghva,et al.  Address extraction using hidden Markov models , 2005, IS&T/SPIE Electronic Imaging.

[6]  Charles F. Goldfarb,et al.  SGML handbook , 1990 .

[7]  T. J. Watson Summarizing Noisy Documents Hongyan Jing Daniel Lopresti Chilin Shih IBM , 2003 .

[8]  Kazem Taghva,et al.  MANICURE document processing system , 1998, Electronic Imaging.

[9]  Gerard Salton,et al.  Length Normalization in Degraded Text Collections , 1995 .

[10]  George Nagy,et al.  Automated Evaluation of OCR Zoning , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[11]  Kazem Taghva,et al.  OCRSpell: an interactive spelling correction system for OCR errors in text , 2001, International Journal on Document Analysis and Recognition.

[12]  Kazem Taghva,et al.  Post-Editing Through Approximation and Global Correction , 1995, Int. J. Pattern Recognit. Artif. Intell..

[13]  Kazem Taghva,et al.  Evaluating text categorization in the presence of OCR errors , 2000, IS&T/SPIE Electronic Imaging.

[14]  Kazem Taghva,et al.  Results of applying probabilistic IR to OCR text , 1994, SIGIR '94.

[15]  Gerard Salton,et al.  Improving retrieval performance by relevance feedback , 1997, J. Am. Soc. Inf. Sci..

[16]  T. A. Nartker,et al.  OCR Accuracy: UNLV's third annual test , 1994 .

[17]  Kazem Taghva,et al.  Evaluation of model-based retrieval effectiveness with OCR text , 1996, TOIS.

[18]  Kazem Taghva,et al.  Recognize , Categorize , and Retrieve , 2001 .

[19]  James Allan,et al.  Approaches to passage retrieval in full text information systems , 1993, SIGIR.

[20]  Kazem Taghva,et al.  The Effects of Noisy Data on Text Retrieval , 1994, J. Am. Soc. Inf. Sci..

[21]  Kazem Taghva,et al.  OCR correction based on document level knowledge , 2003, IS&T/SPIE Electronic Imaging.

[22]  Christian Plaunt,et al.  Subtopic structuring for full-length document access , 1993, SIGIR.

[23]  Gerard Salton,et al.  The SMART Retrieval System—Experiments in Automatic Document Processing , 1971 .

[24]  R. K. Shyamasundar,et al.  Introduction to algorithms , 1996 .

[25]  Kazem Taghva,et al.  Effects of OCR Errors on Ranking and Feedback Using the Vector Space Model , 1996, Inf. Process. Manag..

[26]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorization , 2001, CSUR.

[27]  Michael Fuller,et al.  Structured answers for a large structured document collection , 1993, SIGIR.

[28]  Ian A. Macleod A Query Language for Retrieving Information from Hierarchic Text Structures , 1991, Comput. J..