Enabling Data Recommendation in Scientific Workflow Based on Provenance

The comparing method plays an important role in scientific research. Scientists often make discoveries by studying differences. Particularly in life science research, the sequence alignment is accomplished by searching for similar structures in reference data files. As the scale of scientific data grows, scientists have to spend much time selecting appropriate data files in experiments, in which trust plays a critical role. This paper presents a method to make recommendations for scientists based on trust. We first propose an extended provenance model that captures users' behavioral information during scientific workflow execution. Such provenance information can be used to compute the user's trust in data and mutual trust degree between users. Then based on predicted trust value, data files can be recommended to users. We also design and implement a prototype system to enhance the scientific workflow system's usability by providing scientific data recommendations. Our experiments show that, the recommended data files do a good job in helping scientists to execute workflow successfully.

[1]  Bertram Ludäscher,et al.  A Model for User-Oriented Data Provenance in Pipelined Scientific Workflows , 2006, IPAW.

[2]  Jason Maassen,et al.  Programming Scientific and Distributed Workflow with Triana Services , 2004 .

[3]  Michael J. Pazzani,et al.  Syskill & Webert: Identifying Interesting Web Sites , 1996, AAAI/IAAI, Vol. 1.

[4]  Tun Lu,et al.  PWMDS: A system supporting provenance-based matching and discovery of workflows in proteomics data analysis , 2012, Proceedings of the 2012 IEEE 16th International Conference on Computer Supported Cooperative Work in Design (CSCWD).

[5]  Michael Y. Galperin,et al.  The 2010 Nucleic Acids Research Database Issue and online Database Collection: a community of data resources , 2009, Nucleic Acids Res..

[6]  Loriene Roy,et al.  Content-based book recommending using learning for text categorization , 1999, DL '00.

[7]  Yogesh L. Simmhan,et al.  A survey of data provenance in e-science , 2005, SGMD.

[8]  Bradley N. Miller,et al.  MovieLens unplugged: experiences with an occasionally connected recommender system , 2003, IUI '03.

[9]  Val Tannen,et al.  Provenance for database transformations , 2008, EDBT '10.

[10]  Paul T. Groth,et al.  Pipeline-centric provenance model , 2009, WORKS '09.

[11]  Matthew R. Pocock,et al.  Taverna: a tool for the composition and enactment of bioinformatics workflows , 2004, Bioinform..

[12]  Tun Lu,et al.  Mobi-CoSWAC: An Access Control Approach for Collaborative Scientific Workflow in Mobile Environment , 2012, ICPCA/SWS.

[13]  Yoav Shoham,et al.  Fab: content-based, collaborative recommendation , 1997, CACM.

[14]  Luc Moreau,et al.  The Open Provenance Model , 2007 .

[15]  Pattie Maes,et al.  Social information filtering: algorithms for automating “word of mouth” , 1995, CHI '95.

[16]  Ken Lang,et al.  NewsWeeder: Learning to Filter Netnews , 1995, ICML.

[17]  Johannes Griss,et al.  Published and Perished? The Influence of the Searched Protein Database on the Long-Term Storage of Proteomics Data , 2011, Molecular & Cellular Proteomics.

[18]  Thomas Hofmann,et al.  Unifying collaborative and content-based filtering , 2004, ICML.