Content-based layouts for exploratory metadata search in scientific research data

Today's digital libraries (DLs) archive vast amounts of information in the form of text, videos, images, data measurements, etc. User access to DL content can rely on similarity between metadata elements, or similarity between the data itself (content-based similarity). We consider the problem of exploratory search in large DLs of time-oriented data. We propose a novel approach for overview-first exploration of data collections based on user-selected metadata properties. In a 2D layout representing entities of the selected property are laid out based on their similarity with respect to the underlying data content. The display is enhanced by compact summarizations of underlying data elements, and forms the basis for exploratory navigation of users in the data space. The approach is proposed as an interface for visual exploration, leading the user to discover interesting relationships between data items relying on content-based similarity between data items and their respective metadata labels. We apply the method on real data sets from the earth observation community, showing its applicability and usefulness.

[1]  Steve Pettifer,et al.  Defrosting the Digital Library: Bibliographic Tools for the Next Generation Web , 2008, PLoS Comput. Biol..

[2]  Sean M. McNee,et al.  Enhancing digital libraries with TechLens+ , 2004, JCDL.

[3]  Marc Alexa,et al.  A descriptor for large scale image retrieval based on sketched feature lines , 2009, SBIM '09.

[4]  Sandra Payette,et al.  Fedora: an architecture for complex objects and their relationships , 2005, International Journal on Digital Libraries.

[5]  Antony J. Williams,et al.  A perspective of publicly accessible/open-access chemistry databases. , 2008, Drug discovery today.

[6]  Hermann Ney,et al.  Features for image retrieval: an experimental comparison , 2008, Information Retrieval.

[7]  Tony Hey,et al.  The Fourth Paradigm: Data-Intensive Scientific Discovery , 2009 .

[8]  Irina Sens,et al.  A visual digital library approach for time-oriented scientific primary data , 2011, International Journal on Digital Libraries.

[9]  Stefan M. Rüger Multimedia information retrieval , 2010, SIGIR '10.

[10]  S. Djorgovski,et al.  Sky Surveys , 2012, 1203.5111.

[11]  Nicu Sebe,et al.  Content-based multimedia information retrieval: State of the art and challenges , 2006, TOMCCAP.

[12]  Tobias Schreck,et al.  Retrieval and exploratory search in multivariate research data repositories using regressional features , 2011, JCDL '11.

[13]  B. McArthur,et al.  Baseline surface radiation network (BSRN/WCRP) New precision radiometry for climate research , 1998 .

[14]  Jens Lehmann,et al.  DBpedia: A Nucleus for a Web of Open Data , 2007, ISWC/ASWC.

[15]  Les Carr,et al.  Enhancing access to research data: the challenge of crystallography , 2005, Proceedings of the 5th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL '05).

[16]  Sean M. McNee,et al.  Enhancing digital libraries with TechLens , 2004, Proceedings of the 2004 Joint ACM/IEEE Conference on Digital Libraries, 2004..

[17]  Iraklis Varlamis,et al.  How to Become a Group Leader? or Modeling Author Types Based on Graph Mining , 2011, TPDL.

[18]  Eamonn J. Keogh,et al.  Dimensionality Reduction for Fast Similarity Search in Large Time Series Databases , 2001, Knowledge and Information Systems.

[19]  Andreas Rauber,et al.  Automatically Analyzing and Organizing Music Archives , 2001, ECDL.

[20]  Tobias Schreck,et al.  Reference list of 269 sources used for exploratory search , 2012 .

[21]  Heiko Schuldt,et al.  DelosDLMS - The Integrated DELOS Digital Library Management System , 2007, DELOS.

[22]  Ian H. Witten,et al.  Greenstone: a comprehensive open-source digital library software system , 2000, DL '00.

[23]  Andreas Noack,et al.  An Energy Model for Visual Graph Clustering , 2003, GD.

[24]  Anne E. Trefethen,et al.  Cyberinfrastructure for e-Science , 2005, Science.

[25]  Sung-Hyuk Cha Comprehensive Survey on Distance/Similarity Measures between Probability Density Functions , 2007 .

[26]  Daniel A. Keim,et al.  Challenges in Visual Data Analysis , 2006, Tenth International Conference on Information Visualisation (IV'06).

[27]  Ryen W. White,et al.  Exploratory Search: Beyond the Query-Response Paradigm , 2009, Exploratory Search: Beyond the Query-Response Paradigm.

[28]  D. Y. Chechelnytskyy,et al.  Wolfram Alpha: computational knowledge engine , 2012 .

[29]  Johan Bollen,et al.  An architecture for the aggregation and analysis of scholarly usage data , 2006, Proceedings of the 6th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL '06).

[30]  Juha Vesanto,et al.  SOM-based data visualization methods , 1999, Intell. Data Anal..

[31]  Daniel A. Keim,et al.  Mastering the Information Age - Solving Problems with Visual Analytics , 2010 .

[32]  Lincoln D. Stein,et al.  Towards a cyberinfrastructure for the biological sciences: progress, visions and challenges , 2008, Nature Reviews Genetics.

[33]  Jimeng Sun,et al.  DICON: Interactive Visual Analysis of Multidimensional Clusters , 2011, IEEE Transactions on Visualization and Computer Graphics.

[34]  David G. Stork,et al.  Pattern Classification , 1973 .

[35]  B. L. William Wong,et al.  INVISQUE: Technology and Methodologies for Interactive Information Visualization and Analytics in Large Library Collections , 2011, TPDL.

[36]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[37]  Noel Enyedy,et al.  Little science confronts the data deluge: habitat ecology, embedded sensor networks, and digital libraries , 2007, International Journal on Digital Libraries.

[38]  Arjan Kuijper,et al.  Visual Analysis of Large Graphs: State‐of‐the‐Art and Future Research Challenges , 2011, Eurographics.

[39]  Kei-Hoi Cheung,et al.  Approaches to neuroscience data integration , 2009, Briefings Bioinform..

[40]  Eamonn J. Keogh,et al.  On the Need for Time Series Data Mining Benchmarks: A Survey and Empirical Demonstration , 2002, Data Mining and Knowledge Discovery.

[41]  Tobias Schreck,et al.  Assisted Descriptor Selection Based on Visual Comparative Data Analysis , 2011, Comput. Graph. Forum.