LabBook: Metadata-driven social collaborative data analysis

Open data analysis platforms are being adopted to support collaboration in science and business. Studies suggest that analytic work in an enterprise occurs in a complex ecosystem of people, data, and software working in a coordinated manner. These studies also point to friction between the elements of this ecosystem that reduces user productivity and quality of work. LabBook is an open, social, and collaborative data analysis platform designed explicitly to reduce this friction and accelerate discovery. Its goal is to help users leverage each other's knowledge and experience to find the data, tools and collaborators they need to integrate, visualize, and analyze data. The key insight is to collect and use more metadata about all elements of the analytic ecosystem by means of an architecture and user experience that reduce the cost of contributing such metadata. We demonstrate how metadata can be exploited to improve the collaborative user experience and facilitate collaborative data integration and recommendations. We describe a specific use case and discuss several design issues concerning the capture, representation, querying and use of metadata.

[1]  Fabian M. Suchanek,et al.  Yago: A Core of Semantic Knowledge Unifying WordNet and Wikipedia , 2007 .

[2]  Brian Litt,et al.  Looking at Everything in Context , 2015, CIDR.

[3]  Partha Pratim Talukdar,et al.  The ORCHESTRA Collaborative Data Sharing System , 2008, SIGMOD Rec..

[4]  Dominique Brodbeck,et al.  Research directions in data wrangling: Visualizations and transformations for usable and credible data , 2011, Inf. Vis..

[5]  Aya Soffer,et al.  Social search and discovery using a unified approach , 2009, HT '09.

[6]  Achille Fokoue,et al.  Helix: online enterprise data analytics , 2011, WWW.

[7]  Gerhard Weikum,et al.  WWW 2007 / Track: Semantic Web Session: Ontologies ABSTRACT YAGO: A Core of Semantic Knowledge , 2022 .

[8]  Lise Getoor,et al.  Link mining: a survey , 2005, SKDD.

[9]  Cláudio T. Silva,et al.  VisTrails: visualization meets data management , 2006, SIGMOD Conference.

[10]  Jeffrey Heer,et al.  Enterprise Data Analysis and Visualization: An Interview Study , 2012, IEEE Transactions on Visualization and Computer Graphics.

[11]  Alon Y. Halevy,et al.  Bootstrapping pay-as-you-go data integration systems , 2008, SIGMOD Conference.

[12]  David R. Karger,et al.  Collaborative Data Analytics with DataHub , 2015, Proc. VLDB Endow..

[13]  Noah Alexander,et al.  Geospatial Resolution of Human and Bacterial Diversity with City-Scale Metagenomics , 2015, Cell systems.

[14]  Ronald Fagin,et al.  Data exchange: semantics and query answering , 2003, Theor. Comput. Sci..

[15]  Wang Chiew Tan,et al.  An annotation management system for relational databases , 2004, The VLDB Journal.

[16]  Ronald Fagin,et al.  Translating Web Data , 2002, VLDB.

[17]  Paul A. David,et al.  Understanding the emergence of 'open science' institutions: functionalist economics in historical context , 2004 .

[18]  Eser Kandogan,et al.  From Data to Insight: Work Practices of Analysts in the Enterprise , 2014, IEEE Computer Graphics and Applications.

[19]  Cláudio T. Silva,et al.  Querying and Creating Visualizations by Analogy , 2007, IEEE Transactions on Visualization and Computer Graphics.

[20]  Tim Kraska,et al.  CrowdDB: answering queries with crowdsourcing , 2011, SIGMOD '11.