论文信息 - Advanced Research in Contextual Analysis of Addresses: Phase 3

Advanced Research in Contextual Analysis of Addresses: Phase 3

Abstract : This report describes the continued development and testing of a system for contextual analysis of machine printed address block images. The system receives a binary image of the address block (location of the address block is not a part of this work) and then: (1) segments the image into lines, words, and characters with multiple hypotheses, (2) assigns class confidence to each character hypothesis using neural networks, (3) locates, reads, and reconciles the city name and ZIP code, (4) parses the address block using keyword recognition, (5) if a PO Box is found, reads the box number and verifies it against the postal directory, otherwise, (6) forms a street name lexicon based on contextual information, including number of street name words, word lengths, recognition of suffix and directionals, and the ZIP code, (7) forms an additional street name lexicon based on partial recognition of the street words, (8) uses word recognition within these lexicons to rank street name hypotheses, (9) retrieves street and range records from a postal directory, (10) matches information from the retrieved records to the fields on the mailpiece forming 9- digit ZIP code hypotheses, (11) applies decision logic to assign the finest supportable depth of sort. In an end-to-end test on data selected for OCR difficulty, using corrected LOS scoring, the system had an encode rate of 50% (with 9.5% error) and an accept rate of 84% (with 9.3% error). This compares favorably with an encode rate of 16.7% (with 13.6% error) and an accept rate of 61% (with 15.5% error) achieved by the current MLOCR machine on this same dataset.

Andrew M. Gillies | Daniel J. Hepp | Alan J. Vayda | Michael A. Janeczko