Improved Similarity Trees and their Application to Visual Data Classification

An alternative form to multidimensional projections for the visual analysis of data represented in multidimensional spaces is the deployment of similarity trees, such as Neighbor Joining trees. They organize data objects on the visual plane emphasizing their levels of similarity with high capability of detecting and separating groups and subgroups of objects. Besides this similarity-based hierarchical data organization, some of their advantages include the ability to decrease point clutter; high precision; and a consistent view of the data set during focusing, offering a very intuitive way to view the general structure of the data set as well as to drill down to groups and subgroups of interest. Disadvantages of similarity trees based on neighbor joining strategies include their computational cost and the presence of virtual nodes that utilize too much of the visual space. This paper presents a highly improved version of the similarity tree technique. The improvements in the technique are given by two procedures. The first is a strategy that replaces virtual nodes by promoting real leaf nodes to their place, saving large portions of space in the display and maintaining the expressiveness and precision of the technique. The second improvement is an implementation that significantly accelerates the algorithm, impacting its use for larger data sets. We also illustrate the applicability of the technique in visual data mining, showing its advantages to support visual classification of data sets, with special attention to the case of image classification. We demonstrate the capabilities of the tree for analysis and iterative manipulation and employ those capabilities to support evolving to a satisfactory data organization and classification.

[1]  Matthew Chalmers,et al.  A linear iteration time layout algorithm for visualising high-dimensional data , 1996, Proceedings of Seventh Annual IEEE Visualization '96.

[2]  G. Griffin,et al.  Caltech-256 Object Category Dataset , 2007 .

[3]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[4]  Hartmut Ehrig,et al.  Fundamentals of Algebraic Graph Transformation , 2006, Monographs in Theoretical Computer Science. An EATCS Series.

[5]  Jing Hua,et al.  Exemplar-based Visualization of Large Document Corpus (InfoVis2009-1115) , 2009, IEEE Transactions on Visualization and Computer Graphics.

[6]  Thomas Mailund,et al.  Recrafting the Neighbor-joining Method , 2006 .

[7]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[8]  Charl P. Botha,et al.  Piece wise Laplacian‐based Projection for Interactive Data Exploration and Organization , 2011, Comput. Graph. Forum.

[9]  O. Gascuel,et al.  Neighbor-joining revealed. , 2006, Molecular biology and evolution.

[10]  Grzegorz Rozenberg,et al.  Handbook of Graph Grammars and Computing by Graph Transformations, Volume 1: Foundations , 1997 .

[11]  J. A. Studier,et al.  A note on the neighbor-joining algorithm of Saitou and Nei. , 1988, Molecular biology and evolution.

[12]  Cláudio T. Silva,et al.  Two-Phase Mapping for Projecting Massive Data Sets , 2010, IEEE Transactions on Visualization and Computer Graphics.

[13]  Rosane Minghim,et al.  Point Placement by Phylogenetic Trees and its Application to Visual Analysis of Document Collections , 2007, 2007 IEEE Symposium on Visual Analytics Science and Technology.

[14]  Rosane Minghim,et al.  On Improved Projection Techniques to Support Visual Exploration of Multi-Dimensional Data Sets , 2003, Inf. Vis..

[15]  J. Foster,et al.  Relaxed Neighbor Joining: A Fast Distance-Based Phylogenetic Tree Construction Method , 2006, Journal of Molecular Evolution.

[16]  Thomas Mailund,et al.  Rapid Neighbour-Joining , 2008, WABI.

[17]  Jens Lagergren,et al.  Fast neighbor joining , 2005, Theor. Comput. Sci..

[18]  Qi Tian,et al.  Visualization, Estimation and User-Modeling for Interactive Browsing of Image Libraries , 2002, CIVR.

[19]  Travis J. Wheeler,et al.  Large-Scale Neighbor-Joining with NINJA , 2009, WABI.

[20]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[21]  Ben Shneiderman,et al.  Ordered and quantum treemaps: Making effective use of 2D space to display hierarchies , 2002, TOGS.

[22]  Peter Eades,et al.  A Heuristic for Graph Drawing , 1984 .

[23]  Qi Tian,et al.  PDH: a human-centric interface for image libraries , 2002, ICME.

[24]  Danilo Medeiros Eler,et al.  Visual analysis of image collections , 2009, The Visual Computer.

[25]  Haim Levkowitz,et al.  Least Square Projection: A Fast High-Precision Multidimensional Projection Technique and Its Application to Document Mapping , 2008, IEEE Transactions on Visualization and Computer Graphics.

[26]  M. Nei,et al.  The neighbor-joining method , 1987 .

[27]  Jianping Fan,et al.  Hierarchical classification for automatic image annotation , 2007, SIGIR.

[28]  Catherine Plaisant,et al.  SpaceTree: supporting exploration in large node link tree, design evolution and empirical evaluation , 2002, IEEE Symposium on Information Visualization, 2002. INFOVIS 2002..

[29]  Marcel Worring,et al.  Browsing visual collections using graphs , 2007, MIR '07.

[30]  Jianping Fan,et al.  Integrating Concept Ontology and Multitask Learning to Achieve More Effective Classifier Training for Multilevel Image Annotation , 2008, IEEE Transactions on Image Processing.

[31]  Marcel Worring,et al.  Interactive access to large image collections using similarity-based visualization , 2008, J. Vis. Lang. Comput..

[32]  Luigi Cinque,et al.  A Multidimensional Image Browser , 1998, J. Vis. Lang. Comput..

[33]  Ulrik Brandes,et al.  Drawing Phylogenetic Trees , 2005, ISAAC.

[34]  Andreas Christmann,et al.  Support vector machines , 2008, Data Mining and Knowledge Discovery Handbook.

[35]  Charles L. A. Clarke,et al.  Information Retrieval - Implementing and Evaluating Search Engines , 2010 .

[36]  Cláudio T. Silva,et al.  Interactive Vector Field Feature Identification , 2010, IEEE Transactions on Visualization and Computer Graphics.

[37]  Rosane Minghim,et al.  HiPP: A Novel Hierarchical Point Placement Strategy and its Application to the Exploration of Document Collections , 2008, IEEE Transactions on Visualization and Computer Graphics.

[38]  Kerry Rodden,et al.  Evaluating a visualisation of image similarity as a tool for image browsing , 1999, Proceedings 1999 IEEE Symposium on Information Visualization (InfoVis'99).

[39]  Trevor F. Cox,et al.  Metric multidimensional scaling , 2000 .

[40]  James Ze Wang,et al.  Automatic Linguistic Indexing of Pictures by a Statistical Modeling Approach , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[41]  Ian T. Jolliffe,et al.  Principal Component Analysis , 2002, International Encyclopedia of Statistical Science.