Handwritten Document Image Analysis at Los Alamos: Script, Language, and Writer Identification
暂无分享,去创建一个
A system for automatically identifying the script used in a handwritten document image is described. The system was developed using a 496-document dataset representing six scripts, eight languages, and 281 writers. Documents were characterized by the mean, standard deviation, and skew of five connected component features. A linear discriminant analysis was used to classify new documents, and tested using writer-sensitive cross-validation. Classification accuracy averaged 88% across the six scripts. The same method, applied within the Roman subcorpus, discriminated English and German documents with 85% accuracy. Pilot results indicate that a variation of the method may be applicable to writer identification.
[1] G. Deco,et al. An Information-Theoretic Approach to Neural Computing , 1997, Perspectives in Neural Computing.
[2] A. Lawrence Spitz,et al. Determination of the Script and Language Content of Document Images , 1997, IEEE Trans. Pattern Anal. Mach. Intell..
[3] Patrick Kelly,et al. Automatic Script Identification From Document Images Using Cluster-Based Templates , 1997, IEEE Trans. Pattern Anal. Mach. Intell..
[4] Richard O. Duda,et al. Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.