Multi-dimensional Information Ordering to Support Decision-Making Processes

Massive amounts of textual and digital data are created daily from business or public activities. The organisation, mining and summarization of such a rich and large information source is required to capture the essential and critical knowledge it contains. Such a mining is of strategic importance in many domains including innovation (eg to mine technological reviews and scientific literature) and electronic commerce (eg to mine customer reviews). Information content generally bears several important aspects, mapped onto visualisation dimensions, whose number needs to be reduced to enable relevant interactive exploration. In this paper, we propose a novel strategy to mine and organise document sets, in order to present them in a consistent manner and to highlight interesting and relevant information patterns they contain. We base our method on the formulation of a global optimisation problem solved by using the Traveling Salesman Problem (TSP) approach. We show how this compact formulation opens interesting possibilities for the mining of document collections mapped onto multidimensional information sets. We discuss the issue of scalability and show that associated scalable solutions exist. We demonstrate the effectiveness of our method over several types of documents, embedded into real business cases.

[1]  Marti A. Hearst Chapter 2 of the second edition of Modern Information Retrieval Renamed Modern Information Retrieval : The Concepts and Technology behind Search , 2011 .

[2]  C. Spearman ‘FOOTRULE’ FOR MEASURING CORRELATION , 1906 .

[3]  James Ze Wang,et al.  Image retrieval: Ideas, influences, and trends of the new age , 2008, CSUR.

[4]  Eric O. Postma,et al.  Dimensionality Reduction: A Comparative Review , 2008 .

[5]  Michael I. Jordan,et al.  Distance Metric Learning with Application to Clustering with Side-Information , 2002, NIPS.

[6]  Richard May,et al.  Foundations and Frontiers in Visual Analytics , 2009, Inf. Vis..

[7]  Peter Merz,et al.  A distributed Chained Lin-Kernighan algorithm for TSP problems , 2005, 19th IEEE International Parallel and Distributed Processing Symposium.

[8]  Andreas Kerren,et al.  Toward the role of interaction in Visual Analytics , 2012, Proceedings Title: Proceedings of the 2012 Winter Simulation Conference (WSC).

[9]  Brian W. Kernighan,et al.  An Effective Heuristic Algorithm for the Traveling-Salesman Problem , 1973, Oper. Res..

[10]  Nadine Amende A structured review of information visualization success measurement , 2010, BCS HCI.

[11]  Allan Hanbury,et al.  Patent image retrieval: a survey , 2011, PaIR '11.

[12]  Ricardo Baeza-Yates,et al.  Modern Information Retrieval - the concepts and technology behind search, Second edition , 2011 .

[13]  Michael Bader,et al.  Space-Filling Curves - An Introduction with Applications in Scientific Computing , 2012, Texts in Computational Science and Engineering.

[14]  Jin Zhang Visualization for Information Retrieval (The Information Retrieval Series) , 2007 .

[15]  Dongmei Zhang,et al.  A Hierarchical Distributed Evolutionary Algorithm to TSP , 2010, ISICA.

[16]  Anil K. Jain Data clustering: 50 years beyond K-means , 2008, Pattern Recognit. Lett..

[17]  Dong-Chul Park,et al.  A hierarchical approach for solving large-scale traveling salesman problem , 1994 .

[18]  J. Monnot,et al.  The Traveling Salesman Problem and its Variations , 2014 .

[19]  Alexandr Andoni,et al.  Near-Optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).