1 ACCELERATING SCIENCE : A COMPUTING RESEARCH AGENDA

The emergence of “big data” offers unprecedented opportunities for not only accelerating scientific advances but also enabling new modes of discovery. Scientific progress in many disciplines is increasingly enabled by our ability to examine natural phenomena through the computational lens, i.e., using algorithmic or information processing abstractions of the underlying processes; and our ability to acquire, share, integrate and analyze disparate types of data. However, there is a huge gap between our ability to acquire, store, and process data and our ability to make effective use of the data to advance discovery. Despite successful automation of routine aspects of data management and analytics, most elements of the scientific process currently require considerable human expertise and effort. Accelerating science to keep pace with the rate of data acquisition and data processing calls for the development of algorithmic or information processing abstractions, coupled with formal methods and tools for modeling and simulation of natural processes as well as major innovations in cognitive tools for scientists, i.e., computational tools that leverage and extend the reach of human intellect, and partner with humans on a broad range of tasks in scientific discovery (e.g., identifying, prioritizing formulating questions, designing, prioritizing and executing experiments designed to answer a chosen question, drawing inferences and evaluating the results, and formulating new questions, in a closed-loop fashion). This calls for concerted research agenda aimed at: Development, analysis, integration, sharing, and simulation of algorithmic or information processing abstractions of natural processes, coupled with formal methods and tools for their analyses and simulation; Innovations in cognitive tools that augment and extend human intellect and partner with humans in all aspects of science. This in turn requires: the formalization, development, analysis, of algorithmic or information processing abstractions of various aspects of the scientific process; the development of computational artifacts (representations, processes, protocols, workflows, software) that embody such understanding; and the integration of the resulting cognitive tools into collaborative human-machine systems and infrastructure to advance science.

[1]  Christopher Ré,et al.  Large-scale extraction of gene interactions from full-text literature using DeepDive , 2015, Bioinform..

[2]  Vasant G Honavar,et al.  Computational prediction of protein interfaces: A review of data driven methods , 2015, FEBS letters.

[3]  Jacob G Foster,et al.  Choosing experiments to accelerate collective discovery , 2015, Proceedings of the National Academy of Sciences.

[4]  Michael Szell,et al.  A century of physics , 2015, Nature Physics.

[5]  Michael I. Jordan,et al.  Machine learning: Trends, perspectives, and prospects , 2015, Science.

[6]  Zoubin Ghahramani,et al.  Probabilistic machine learning and artificial intelligence , 2015, Nature.

[7]  Laura M. Haas The Power Behind the Throne: Information Integration in the Age of Data-Driven Discovery , 2015, SIGMOD Conference.

[8]  Nagiza F. Samatova,et al.  Theory-Guided Data Science for Climate Change , 2014, Computer.

[9]  Vasant Honavar,et al.  The Promise and Potential of Big Data: A Case for Discovery Informatics , 2014 .

[10]  R. Bonney,et al.  Next Steps for Citizen Science , 2014, Science.

[11]  Vasant Honavar,et al.  Transportability from Multiple Environments with Limited Experiments , 2013, NIPS.

[12]  A. Budden,et al.  Big data and the future of ecology , 2013 .

[13]  Amit P. Sheth,et al.  A graph-based recovery and decomposition of Swanson's hypothesis using semantic predications , 2013, J. Biomed. Informatics.

[14]  Yolanda Gil,et al.  Discovery Informatics: AI Opportunities in Scientific Discovery , 2012, AAAI Fall Symposium: Discovery Informatics.

[15]  Alon Y. Halevy,et al.  Principles of Data Integration , 2012 .

[16]  S. Brunak,et al.  Mining electronic health records: towards better research applications and clinical care , 2012, Nature Reviews Genetics.

[17]  Neil R. Smalheiser,et al.  Literature-based discovery: Beyond the ABCs , 2012, J. Assoc. Inf. Sci. Technol..

[18]  Christopher Ré,et al.  DeepDive: Web-scale Knowledge-base Construction using Statistical Learning and Inference , 2012, VLDS.

[19]  Richard M. Karp,et al.  Understanding Science Through the Computational Lens , 2011, Journal of Computer Science and Technology.

[20]  Xiaolong Zhang,et al.  CollabSeer: a search engine for collaboration discovery , 2011, JCDL '11.

[21]  Huajun Chen,et al.  The Semantic Web , 2011, Lecture Notes in Computer Science.

[22]  Sebastián Ventura,et al.  Educational Data Mining: A Review of the State of the Art , 2010, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[23]  Estevam R. Hruschka,et al.  Toward an Architecture for Never-Ending Language Learning , 2010, AAAI.

[24]  Tim Roughgarden,et al.  Algorithmic Game Theory , 2007 .

[25]  Tony Hey,et al.  The Fourth Paradigm: Data-Intensive Scientific Discovery , 2009 .

[26]  Vasant Honavar,et al.  Package-Based Description Logics , 2009, Modular Ontologies.

[27]  Corrado Priami,et al.  Algorithmic systems biology , 2009, CACM.

[28]  Bruce G Buchanan,et al.  Automating Science , 2009, Science.

[29]  Hod Lipson,et al.  Distilling Free-Form Natural Laws from Experimental Data , 2009, Science.

[30]  Tom M. Mitchell,et al.  Machine learning classifiers and fMRI: A tutorial overview , 2009, NeuroImage.

[31]  Juliana Freire,et al.  Provenance and scientific workflows: challenges and opportunities , 2008, SIGMOD Conference.

[32]  Ian Horrocks,et al.  Modular Reuse of Ontologies: Theory and Practice , 2008, J. Artif. Intell. Res..

[33]  Geoffrey C. Fox,et al.  Examining the Challenges of Scientific Workflows , 2007, Computer.

[34]  T. Henzinger,et al.  Executable cell biology , 2007, Nature Biotechnology.

[35]  Diego Calvanese,et al.  Tractable Reasoning and Efficient Query Answering in Description Logics: The DL-Lite Family , 2007, Journal of Automated Reasoning.

[36]  Saso Dzeroski,et al.  Computational Discovery of Scientific Knowledge , 2007, Computational Discovery of Scientific Knowledge.

[37]  Anthony Hunter,et al.  Elements of Argumentation , 2007, ECSQARU.

[38]  Oren Etzioni,et al.  Machine Reading , 2006, AAAI.

[39]  J. Tenenbaum,et al.  Theory-based Bayesian models of inductive learning and reasoning , 2006, Trends in Cognitive Sciences.

[40]  Carole A. Goble,et al.  Taverna: a tool for building and running workflows of services , 2006, Nucleic Acids Res..

[41]  K. Cohen,et al.  Biomedical language processing: what's beyond PubMed? , 2006, Molecular cell.

[42]  Cosimo Laneve,et al.  Formal molecular biology , 2004, Theor. Comput. Sci..

[43]  Lior Pachter,et al.  Multiple-sequence functional annotation and the generalized hidden Markov phylogeny , 2004, Bioinform..

[44]  Adrien Richard,et al.  Application of formal methods to biological regulatory networks: extending Thomas' asynchronous logical approach with temporal logic. , 2004, Journal of theoretical biology.

[45]  Carsten Lutz,et al.  E-connections of abstract description systems , 2004, Artif. Intell..

[46]  Clark Glymour,et al.  The automation of discovery , 2004, Daedalus.

[47]  Robert L. Goldstone,et al.  The simultaneous evolution of author and paper networks , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[48]  David Maxwell Chickering,et al.  Learning Bayesian Networks: The Combination of Knowledge and Statistical Data , 1994, Machine Learning.

[49]  Luciano Serafini,et al.  Distributed Description Logics: Assimilating Information from Peer Sources , 2003, J. Data Semant..

[50]  D. Pe’er,et al.  Module networks: identifying regulatory modules and their condition-specific regulators from gene expression data , 2003, Nature Genetics.

[51]  Howard Greisdorf,et al.  Exploring Science: The Cognition and Development of Discovery Processes , 2003, J. Documentation.

[52]  Werner Nutt,et al.  Basic Description Logics , 2003, Description Logic Handbook.

[53]  Maurizio Lenzerini,et al.  Data integration: a theoretical perspective , 2002, PODS.

[54]  Kazumi Saito,et al.  Computational Discovery of Communicable Scientific Knowledge , 2002 .

[55]  Andrei Voronkov,et al.  Logic for Programming and Automated Reasoning , 2002, Lecture Notes in Artificial Intelligence.

[56]  Glenn Fung,et al.  Knowledge-Based Support Vector Machine Classifiers , 2002, NIPS.

[57]  Elizabeth Bradley,et al.  Reasoning about nonlinear system identification , 2001, Artif. Intell..

[58]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[59]  Jon M. Kleinberg,et al.  The small-world phenomenon: an algorithmic perspective , 2000, STOC '00.

[60]  Gregory R. Grant,et al.  Bioinformatics - The Machine Learning Approach , 2000, Comput. Chem..

[61]  Ian Horrocks,et al.  Practical Reasoning for Expressive Description Logics , 1999, LPAR.

[62]  Raúl E. Valdés-Pérez,et al.  Principles of Human Computer Collaboration for Knowledge Discovery in Science , 1999, Artif. Intell..

[63]  Michael J. Pazzani,et al.  Beyond Concise and Colorful: Learning Intelligible Rules , 1997, KDD.

[64]  Arie Rip,et al.  The Computer Revolution in Science: Steps Towards the Realization of Computer-Supported Discovery Environments , 1997, Artif. Intell..

[65]  Neil R. Smalheiser,et al.  Artificial Intelligence An interactive system for finding complementary literatures : a stimulus to scientific discovery , 1995 .

[66]  T. Kuhn The structure of scientific revolutions, 3rd ed. , 1996 .

[67]  Larry S. Davis,et al.  Computer Architectures for Machine Perception , 1993 .

[68]  R. Persaud Philosophy of science , 1992, The Lancet.

[69]  William W. Cohen Compiling prior knowledge into an explicit basis , 1992, ICML 1992.

[70]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[71]  D. Perkins,et al.  Partners in Cognition: Extending Human Intelligence with Intelligent Technologies , 1991 .

[72]  P. Langley,et al.  Computational Models of Scientific Discovery and Theory Formation , 1990 .

[73]  Jieming Zhu,et al.  Automated Discovery in a Chemistry Laboratory , 1990, AAAI.

[74]  Martin Stacey,et al.  Scientific Discovery: Computational Explorations of the Creative Processes , 1988 .

[75]  C. Wright Representing and Intervening , 1985 .

[76]  Pat Langley,et al.  Data-Driven Discovery of Physical Laws , 1981, Cogn. Sci..

[77]  Joshua Lederberg,et al.  Applications of Artificial Intelligence for Organic Chemistry: The DENDRAL Project , 1980 .

[78]  A. F. Chalmers,et al.  What Is This Thing Called Science , 1976 .

[79]  T. Kuhn,et al.  The Structure of Scientific Revolutions. , 1964 .

[80]  D. C. Englebart,et al.  Augmenting human intellect: a conceptual framework , 1962 .