Reasoning About Discovery Clouds

A discovery cloud is a set of automated, cloud-hosted services to which individuals may outsource their routine and not-so-routine research tasks: finding relevant data, inferring links between data, running computational experiments, inferring new knowledge claims, evaluating the credibility of knowledge claims produced by others, designing experiments, and so on. If developed successfully, a discovery cloud can accelerate and democratize access to data and knowledge tools and the collaborative construction of new knowledge. Such systems are also fascinating to consider from a reasoning perspective because they integrate great complexity at multiple levels: the underlying cloud-based hardware and software, for which issues of reliability and responsiveness may be paramount; the knowledge bases and inference engines that sit on that cloud substrate, for which issues of correctness may be less well defined; and the human communities that form around the discovery clouds, and that arguably form as much as part of the cloud as the hardware, software, and data. I raise questions here about what it might mean to reason about such systems. I do not provide any answers.

[1]  Ian T. Foster,et al.  Globus Online: Accelerating and Democratizing Science through Cloud-Based Services , 2011, IEEE Internet Computing.

[2]  Nolan Li,et al.  Batch is back: CasJobs, serving multi-TB data on the Web , 2005, IEEE International Conference on Web Services (ICWS'05).

[3]  Fangfang Xia,et al.  The SEED and the Rapid Annotation of microbial genomes using Subsystems Technology (RAST) , 2013, Nucleic Acids Res..

[4]  I. Foster,et al.  Service-Oriented Science , 2005, Science.

[5]  Andreas Wilke,et al.  phylogenetic and functional analysis of metagenomes , 2022 .

[6]  Wil M. P. van der Aalst,et al.  The Application of Petri Nets to Workflow Management , 1998, J. Circuits Syst. Comput..

[7]  Gilles Fedak,et al.  Active Data: A programming model to manage data life cycle across heterogeneous systems and infrastructures , 2015, Future Gener. Comput. Syst..

[8]  Hod Lipson,et al.  Learning symbolic representations of hybrid dynamical systems , 2012, J. Mach. Learn. Res..

[9]  Carver A. Mead,et al.  Neuromorphic electronic systems , 1990, Proc. IEEE.

[10]  Ian T. Foster,et al.  Efficient and Secure Transfer, Synchronization, and Sharing of Big Data , 2014, IEEE Cloud Computing.

[11]  Jacob G Foster,et al.  Choosing experiments to accelerate collective discovery , 2015, Proceedings of the National Academy of Sciences.

[12]  Philip Koehn,et al.  Statistical Machine Translation , 2010, EAMT.

[13]  Vasant Honavar,et al.  Accelerating Science: A Computing Research Agenda , 2016, ArXiv.

[15]  B. S. Manjunath,et al.  The iPlant Collaborative: Cyberinfrastructure for Plant Biology , 2011, Front. Plant Sci..

[16]  Alexander S. Szalay,et al.  From simulations to interactive numerical laboratories , 2014, Proceedings of the Winter Simulation Conference 2014.

[17]  Peter Z. Kunszt,et al.  The SDSS skyserver: public access to the sloan digital sky server data , 2001, SIGMOD '02.

[18]  Marc'Aurelio Ranzato,et al.  Building high-level features using large scale unsupervised learning , 2011, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[19]  S. George Djorgovski,et al.  Virtual astronomy, information technology, and the new scientific methodology , 2005, Seventh International Workshop on Computer Architecture for Machine Perception (CAMP'05).

[20]  Gilles Fedak,et al.  Using Active Data to Provide Smart Data Surveillance to E-Science Users , 2015, 2015 23rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing.

[21]  A. Whitehead An Introduction to Mathematics , 1949, Nature.

[22]  Zhao Zhang,et al.  Parallel Scripting for Applications at the Petascale and Beyond , 2009, Computer.

[23]  George Lawton,et al.  Developing Software Online With Platform-as-a-Service Technology , 2008, Computer.

[24]  Daniel S. Katz,et al.  Pegasus: A framework for mapping complex scientific workflows onto distributed systems , 2005, Sci. Program..

[25]  Ian T. Foster,et al.  The Discovery Cloud: Accelerating and Democratizing Research on a Global Scale , 2016, 2016 IEEE International Conference on Cloud Engineering (IC2E).

[26]  Tadao Murata,et al.  Petri nets: Properties, analysis and applications , 1989, Proc. IEEE.

[27]  Ian T. Foster,et al.  Globus platform‐as‐a‐service for collaborative science applications , 2015, Concurr. Comput. Pract. Exp..

[28]  Steven Tuecke,et al.  Software as a Service as a path to software sustainability , 2013 .

[29]  Rick L. Stevens,et al.  The SEED: a peer-to-peer environment for genome annotation , 2004, CACM.

[30]  Carole A. Goble,et al.  Taverna: a tool for building and running workflows of services , 2006, Nucleic Acids Res..

[31]  A. Nekrutenko,et al.  Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences , 2010, Genome Biology.