Character Image Patterns as Big Data

The ambitious goal of this research is to understand the real distribution of character patterns. Ideally, if we can collect all possible character patterns, we can totally understand how they are distributed in the image space. In addition, we also have the perfect character recognizer because we know the correct class for any character image. Of course, it is practically impossible to collect all those patterns - however, if we collect character patterns massively and analyze how the distribution changes according to the increase of patterns, we will be able to estimate the real distribution asymptotically. For this purpose, we use 822,714 manually ground-truthed 32×32 handwritten digit patterns in this paper. The distribution of those patterns are observed by nearest neighbor analysis and network analysis, both of which do not make any approximation (such as low-dimensional representation) and thus do not corrupt the details of the distribution.

[1]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  Krista A. Ehinger,et al.  SUN database: Large-scale scene recognition from abbey to zoo , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[3]  Harry Shum,et al.  Face Hallucination: Theory and Practice , 2007, International Journal of Computer Vision.

[4]  Karl Sims,et al.  Handwritten Character Classification Using Nearest Neighbor in Large Databases , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[5]  Antonio Criminisi,et al.  Harvesting Image Databases from the Web , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[6]  Antonio Torralba,et al.  Ieee Transactions on Pattern Analysis and Machine Intelligence 1 80 Million Tiny Images: a Large Dataset for Non-parametric Object and Scene Recognition , 2022 .

[7]  Antonio Torralba,et al.  Nonparametric Scene Parsing via Label Transfer , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Parham Aarabi,et al.  Tiny Videos: A Large Data Set for Nonparametric Video Retrieval and Frame Classification , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Alexei A. Efros,et al.  Scene completion using millions of photographs , 2007, SIGGRAPH 2007.