Document Cards: A Top Trumps Visualization for Documents

Finding suitable, less space consuming views for a document's main content is crucial to provide convenient access to large document collections on display devices of different size. We present a novel compact visualization which represents the document's key semantic as a mixture of images and important key terms, similar to cards in a top trumps game. The key terms are extracted using an advanced text mining approach based on a fully automatic document structure extraction. The images and their captions are extracted using a graphical heuristic and the captions are used for a semi-semantic image weighting. Furthermore, we use the image color histogram for classification and show at least one representative from each non-empty image class. The approach is demonstrated for the IEEE InfoVis publications of a complete year. The method can easily be applied to other publication collections and sets of documents which contain images.

[1]  Kyo Kageura,et al.  METHODS OF AUTOMATIC TERM RECOGNITION : A REVIEW , 1996 .

[2]  Carl Gutwin,et al.  Faster document navigation with space-filling thumbnails , 2006, CHI.

[3]  Yoji Kajitani,et al.  Rectangle-packing-based module placement , 1995, ICCAD.

[4]  Vidya Setlur,et al.  Semanticons: Visual Metaphors as File Icons , 2005, Comput. Graph. Forum.

[5]  Wolfgang Kienreich,et al.  On the Beauty and Usability of Tag Clouds , 2008, 2008 12th International Conference Information Visualisation.

[6]  Gerd Maderlechner,et al.  Finding Captions in PDF-Documents for Semantic Annotations of Images , 2006, SSPR/SPR.

[7]  Miguel A. Andrade-Navarro,et al.  Information extraction from full text scientific articles: Where are the keywords? , 2003, BMC Bioinformatics.

[8]  Allison Woodruff,et al.  Popout prism: adding perceptual principles to overview+detail document interfaces , 2002, CHI.

[9]  Nuno Vasconcelos,et al.  A Kullback-Leibler Divergence Based Kernel for SVM Classification in Multimedia Applications , 2003, NIPS.

[10]  Patrick Haffner,et al.  Support vector machines for histogram-based image classification , 1999, IEEE Trans. Neural Networks.

[11]  William W. Cohen,et al.  Extracting information from text and images for location proteomics , 2003, BIOKDD.

[12]  Takayuki Itoh,et al.  Hierarchical data visualization using a fast rectangle-packing algorithm , 2004, IEEE Transactions on Visualization and Computer Graphics.

[13]  Kris Popat,et al.  Paper to PDA , 2002, Object recognition supported by user interaction for service robots.

[14]  Patrick Baudisch,et al.  Summary thumbnails: readable overviews for small screen web browsers , 2005, CHI.

[15]  Jian Fan,et al.  Layout and Content Extraction for PDF Documents , 2004, Document Analysis Systems.

[16]  Tyson R. Henry,et al.  Multidimensional icons , 1990, TOGS.

[17]  Nuno Vasconcelos,et al.  On the efficient evaluation of probabilistic similarity functions for image retrieval , 2004, IEEE Transactions on Information Theory.

[18]  Berna Erol,et al.  Multimedia thumbnails for documents , 2006, MM '06.

[19]  Mitsuru Ishizuka,et al.  Keyword extraction from a single document using word co-occurrence statistical information , 2004, Int. J. Artif. Intell. Tools.

[20]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[21]  Richard E. Korf,et al.  Optimal Rectangle Packing: Initial Results , 2003, ICAPS.

[22]  Benjamin B. Bederson,et al.  Automatic thumbnail cropping and its effectiveness , 2003, UIST '03.

[23]  Martijn J. Schuemie,et al.  Distribution of information in biomedical abstracts and full-text publications , 2004, Bioinform..

[24]  Karen Sparck Jones A statistical interpretation of term specificity and its application in retrieval , 1972 .

[25]  Yehuda Lindell,et al.  Text Mining at the Term Level , 1998, PKDD.

[26]  James D. Hollan,et al.  Spatial Tools for Managing Personal Information Collections , 2005, Proceedings of the 38th Annual Hawaii International Conference on System Sciences.

[27]  Berna Erol,et al.  Multimedia thumbnails for documents: implementation and demonstration , 2006, MM '06.

[28]  Gerard Salton,et al.  A vector space model for automatic indexing , 1975, CACM.

[29]  Andreas Dieberger,et al.  Synthesizing evocative imagery through design patterns , 2003, 36th Annual Hawaii International Conference on System Sciences, 2003. Proceedings of the.

[30]  Kathrin Berkner,et al.  SmartNails: display- and image-dependent thumbnails , 2003, IS&T/SPIE Electronic Imaging.

[31]  Kathrin Berkner How small should a document thumbnail be? , 2006, Electronic Imaging.

[32]  Rainer Kuhlen,et al.  Experimentelle Morphologie in der Informationswissenschaft , 1977 .

[33]  John P. Lewis,et al.  VisualIDs: automatic distinctive icons for desktop interfaces , 2004, ACM Trans. Graph..

[34]  Karen Spärck Jones A statistical interpretation of term specificity and its application in retrieval , 2021, J. Documentation.