Least Square Projection: A Fast High-Precision Multidimensional Projection Technique and Its Application to Document Mapping

The problem of projecting multidimensional data into lower dimensions has been pursued by many researchers due to its potential application to data analyses of various kinds. This paper presents a novel multidimensional projection technique based on least square approximations. The approximations compute the coordinates of a set of projected points based on the coordinates of a reduced number of control points with defined geometry. We name the technique least square projections (LSP). From an initial projection of the control points, LSP defines the positioning of their neighboring points through a numerical solution that aims at preserving a similarity relationship between the points given by a metric in mD. In order to perform the projection, a small number of distance calculations are necessary, and no repositioning of the points is required to obtain a final solution with satisfactory precision. The results show the capability of the technique to form groups of points by degree of similarity in 2D. We illustrate that capability through its application to mapping collections of textual documents from varied sources, a strategic yet difficult application. LSP is faster and more accurate than other existing high-quality methods, particularly where it was mostly tested, that is, for mapping text sets.

[1]  Ana Beatriz Vicentim Graciano,et al.  Graph-based Object Tracking Using Structural Pattern Recognition , 2007 .

[2]  G Salton,et al.  Developments in Automatic Text Retrieval , 1991, Science.

[3]  Peter Eades,et al.  A Heuristic for Graph Drawing , 1984 .

[4]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[5]  Edward M. Reingold,et al.  Graph drawing by force‐directed placement , 1991, Softw. Pract. Exp..

[6]  Ricardo A. Baeza-Yates,et al.  Searching in metric spaces , 2001, CSUR.

[7]  Alberto Muñoz,et al.  A new Sammon algorithm for sparse data visualization , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[8]  N. L. Johnson,et al.  Multivariate Analysis , 1958, Nature.

[9]  Ben Shneiderman,et al.  Readings in information visualization - using vision to think , 1999 .

[10]  Chaomei Chen,et al.  Visualizing knowledge domains , 2005, Annu. Rev. Inf. Sci. Technol..

[11]  Jeanny Hérault,et al.  Curvilinear component analysis: a self-organizing neural network for nonlinear mapping of data sets , 1997, IEEE Trans. Neural Networks.

[12]  Rosane Minghim,et al.  The Projection Explorer: A Flexible Tool for Projection-based Multidimensional Visualization , 2007, XX Brazilian Symposium on Computer Graphics and Image Processing (SIBGRAPI 2007).

[13]  Rosane Minghim,et al.  Text Map Explorer: a Tool to Create and Explore Document Maps , 2006, Tenth International Conference on Information Visualisation (IV'06).

[14]  John W. Sammon,et al.  A Nonlinear Mapping for Data Structure Analysis , 1969, IEEE Transactions on Computers.

[15]  Haim Levkowitz,et al.  From Visual Data Exploration to Visual Data Mining: A Survey , 2003, IEEE Trans. Vis. Comput. Graph..

[16]  Paul M. B. Vitányi,et al.  Clustering by compression , 2003, IEEE Transactions on Information Theory.

[17]  James A. Wise,et al.  The Ecological Approach to Text Visualization , 1999, J. Am. Soc. Inf. Sci..

[18]  Michael S. Floater,et al.  Parametrization and smooth approximation of surface triangulations , 1997, Comput. Aided Geom. Des..

[19]  Matthew Chalmers,et al.  A hybrid layout algorithm for sub-quadratic multidimensional scaling , 2002, IEEE Symposium on Information Visualization, 2002. INFOVIS 2002..

[20]  Pat Langley,et al.  Models of Incremental Concept Formation , 1990, Artif. Intell..

[21]  J. Shewchuk An Introduction to the Conjugate Gradient Method Without the Agonizing Pain , 1994 .

[22]  Gerald L. Engel,et al.  VISUALIZATION AND COMPUTER GRAPHICS , 2005 .

[23]  Pavel Berkhin,et al.  A Survey of Clustering Data Mining Techniques , 2006, Grouping Multidimensional Data.

[24]  Matthew Chalmers,et al.  Fast Multidimensional Scaling Through Sampling, Springs and Interpolation , 2003, Inf. Vis..

[25]  James J. Thomas,et al.  Visualizing the non-visual: spatial analysis and interaction with information from text documents , 1995, Proceedings of Visualization 1995 Conference.

[26]  Rosane Minghim,et al.  On Improved Projection Techniques to Support Visual Exploration of Multi-Dimensional Data Sets , 2003, Inf. Vis..

[27]  Wolfgang Kienreich,et al.  The InfoSky visual explorer: Exploiting Hierarchical Structure and Document Similarities , 2002, Inf. Vis..

[28]  Li Yang,et al.  Sammon's nonlinear mapping using geodesic distances , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[29]  Christian Rössl,et al.  Laplacian surface editing , 2004, SGP '04.

[30]  Heng Tao Shen,et al.  Principal Component Analysis , 2009, Encyclopedia of Biometrics.

[31]  Rosane Minghim,et al.  Normalized compression distance for visual analysis of document collections , 2007, Comput. Graph..

[32]  J. Kruskal Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis , 1964 .

[33]  Christos Faloutsos,et al.  FastMap: a fast algorithm for indexing, data-mining and visualization of traditional and multimedia datasets , 1995, SIGMOD '95.

[34]  Guy Melançon,et al.  Multiscale hybrid MDS , 2004, Proceedings. Eighth International Conference on Information Visualisation, 2004. IV 2004..

[35]  Matthew Chalmers,et al.  A linear iteration time layout algorithm for visualising high-dimensional data , 1996, Proceedings of Seventh Annual IEEE Visualization '96.

[36]  Rosane Minghim,et al.  Visual Mapping of Text Collections through a Fast High Precision Projection Technique , 2006, Tenth International Conference on Information Visualisation (IV'06).

[37]  Daniel Cohen-Or,et al.  Least-squares meshes , 2004, Proceedings Shape Modeling Applications, 2004..

[38]  Matthew Chalmers,et al.  A Pivot-Based Routine for Improved Parent-Finding in Hybrid MDS† , 2004, Inf. Vis..

[39]  Rosane Minghim,et al.  Visual text mining using association rules , 2007, Comput. Graph..