Classifying chart images with sparse coding

We present an approach for classifying chart images with sparse coding. Three chart categories are considered: bar charts, pie charts and line graphs. We introduce the Laplacian of Gaussian (LoG) to smooth noise in the image and detect candidate regions of interest. Noting that charts typically contain both text and graphics, we identify text and graphic regions and learn informative features from them. Each image is then represented by a feature vector, which can be used to learn a sparse representation via the dictionary learning algorithm for classification. We evaluate the proposed systematic approach by a set of charts drawn from the internet. The encouraging results certifies the proposed method.

[1]  Hagit Shatkay,et al.  An Automatic System for Extracting Figures and Captions in Biomedical PDF Documents , 2011, 2011 IEEE International Conference on Bioinformatics and Biomedicine.

[2]  Rafael C. González,et al.  Local Determination of a Moving Contrast Edge , 1985, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Chew Lim Tan,et al.  Model-Based Chart Image Recognition , 2003, GREC.

[4]  Kjersti Engan,et al.  Frame based signal compression using method of optimal directions (MOD) , 1999, ISCAS'99. Proceedings of the 1999 IEEE International Symposium on Circuits and Systems VLSI (Cat. No.99CH36349).

[5]  Chew Lim Tan,et al.  Hough technique for bar charts detection and recognition in document images , 2000, Proceedings 2000 International Conference on Image Processing (Cat. No.00CH37101).

[6]  Eric P. Xing,et al.  Large-Scale Category Structure Aware Image Categorization , 2011, NIPS.

[7]  Chew Lim Tan,et al.  A system for understanding imaged infographics and its applications , 2007, DocEng '07.

[8]  Jeffrey Heer,et al.  Crowdsourcing graphical perception: using mechanical turk to assess visualization design , 2010, CHI.

[9]  Kenneth E. Barner,et al.  Image categorization for improving accessibility to information graphics , 2010, ASSETS '10.

[10]  Andrew Y. Ng,et al.  Text Detection and Character Recognition in Scene Images with Unsupervised Feature Learning , 2011, 2011 International Conference on Document Analysis and Recognition.

[11]  Huizhong Chen,et al.  Robust text detection in natural images with edge-enhanced Maximally Stable Extremal Regions , 2011, 2011 18th IEEE International Conference on Image Processing.

[12]  Jeffrey Heer,et al.  ReVision: automated classification, analysis and redesign of chart images , 2011, UIST.

[13]  James Ze Wang,et al.  Automatic categorization of figures in scientific documents , 2006, Proceedings of the 6th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL '06).

[14]  Larry S. Davis,et al.  Classifying Computer Generated Charts , 2007, 2007 International Workshop on Content-Based Multimedia Indexing.

[15]  M. Elad,et al.  $rm K$-SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation , 2006, IEEE Transactions on Signal Processing.

[16]  Allen Y. Yang,et al.  Robust Face Recognition via Sparse Representation , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.