Skim-reading thousands of documents in one minute: Data indexing and visualization for multifarious search

In this paper we present an interface based on a recent generative model, the counting grid, here re-introduced in its basic version and largely revised to allow it to deal with large corpora. We show that it is possible to visualize thousands of high order word cooccurrence patterns by only viewing for a few minutes a new embedding we propose for text visualization, browsing and search purposes. We performed preliminary experiments with user tasks such as word spotting, rapid content search and collateral information acquisition.

[1]  Gal Chechik,et al.  Euclidean Embedding of Co-occurrence Data , 2004, J. Mach. Learn. Res..

[2]  C. A. Becker Semantic context effects in visual word recognition: An analysis of semantic strategies , 1980, Memory & cognition.

[3]  Susan T. Dumais,et al.  Discovery is never by chance: designing for (un)serendipity , 2009, C&C '09.

[4]  J Allan,et al.  Readings in information retrieval. , 1998 .

[5]  H. Intraub The representation of visual scenes , 1997, Trends in Cognitive Sciences.

[6]  Manfred Tscheligi,et al.  Semantically structured tag clouds: an empirical evaluation of clustered presentation approaches , 2009, CHI.

[7]  Peter Willett,et al.  Readings in information retrieval , 1997 .

[8]  Nebojsa Jojic,et al.  Multidimensional counting grids: Inferring word order from disordered bags of words , 2011, UAI.

[9]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[10]  Dunja Mladenic,et al.  Visualization of Text Document Corpus , 2005, Informatica.

[11]  M A Just,et al.  A theory of reading: from eye fixations to comprehension. , 1980, Psychological review.

[12]  K. Rayner Eye movements in reading and information processing: 20 years of research. , 1998, Psychological bulletin.

[13]  Benjamin Naumann,et al.  Mental Representations A Dual Coding Approach , 2016 .

[14]  James J. Thomas,et al.  Visualizing the non-visual: spatial analysis and interaction with information from text documents , 1995, Proceedings of Visualization 1995 Conference.