Accelerating Science: A Computing Research Agenda

Author(s): Honavar, Vasant G; Hill, Mark D; Yelick, Katherine | Abstract: The emergence of "big data" offers unprecedented opportunities for not only accelerating scientific advances but also enabling new modes of discovery. Scientific progress in many disciplines is increasingly enabled by our ability to examine natural phenomena through the computational lens, i.e., using algorithmic or information processing abstractions of the underlying processes; and our ability to acquire, share, integrate and analyze disparate types of data. However, there is a huge gap between our ability to acquire, store, and process data and our ability to make effective use of the data to advance discovery. Despite successful automation of routine aspects of data management and analytics, most elements of the scientific process currently require considerable human expertise and effort. Accelerating science to keep pace with the rate of data acquisition and data processing calls for the development of algorithmic or information processing abstractions, coupled with formal methods and tools for modeling and simulation of natural processes as well as major innovations in cognitive tools for scientists, i.e., computational tools that leverage and extend the reach of human intellect, and partner with humans on a broad range of tasks in scientific discovery (e.g., identifying, prioritizing formulating questions, designing, prioritizing and executing experiments designed to answer a chosen question, drawing inferences and evaluating the results, and formulating new questions, in a closed-loop fashion). This calls for concerted research agenda aimed at: Development, analysis, integration, sharing, and simulation of algorithmic or information processing abstractions of natural processes, coupled with formal methods and tools for their analyses and simulation; Innovations in cognitive tools that augment and extend human intellect and partner with humans in all aspects of science.

[1]  Maurizio Lenzerini,et al.  Data integration: a theoretical perspective , 2002, PODS.

[2]  Clark Glymour,et al.  The automation of discovery , 2004, Daedalus.

[3]  Christopher Ré,et al.  Large-scale extraction of gene interactions from full-text literature using DeepDive , 2015, Bioinform..

[4]  T. Henzinger,et al.  Executable cell biology , 2007, Nature Biotechnology.

[5]  D. Perkins,et al.  Partners in Cognition: Extending Human Intelligence with Intelligent Technologies , 1991 .

[6]  Ian Horrocks,et al.  Practical Reasoning for Expressive Description Logics , 1999, LPAR.

[7]  Richard M. Karp,et al.  Understanding Science Through the Computational Lens , 2011, Journal of Computer Science and Technology.

[8]  Estevam R. Hruschka,et al.  Toward an Architecture for Never-Ending Language Learning , 2010, AAAI.

[9]  Zoubin Ghahramani,et al.  Probabilistic machine learning and artificial intelligence , 2015, Nature.

[10]  T. Kuhn,et al.  The Structure of Scientific Revolutions. , 1964 .

[11]  Diego Calvanese,et al.  Tractable Reasoning and Efficient Query Answering in Description Logics: The DL-Lite Family , 2007, Journal of Automated Reasoning.

[12]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[13]  Raúl E. Valdés-Pérez,et al.  Principles of Human Computer Collaboration for Knowledge Discovery in Science , 1999, Artif. Intell..

[14]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[15]  Vasant Honavar,et al.  Package-Based Description Logics , 2009, Modular Ontologies.

[16]  Yolanda Gil,et al.  Discovery Informatics: AI Opportunities in Scientific Discovery , 2012, AAAI Fall Symposium: Discovery Informatics.

[17]  Christopher Ré,et al.  DeepDive: Web-scale Knowledge-base Construction using Statistical Learning and Inference , 2012, VLDS.

[18]  Nagiza F. Samatova,et al.  Theory-Guided Data Science for Climate Change , 2014, Computer.

[19]  Juliana Freire,et al.  Provenance and scientific workflows: challenges and opportunities , 2008, SIGMOD Conference.

[20]  Werner Nutt,et al.  Basic Description Logics , 2003, Description Logic Handbook.

[21]  D. Pe’er,et al.  Module networks: identifying regulatory modules and their condition-specific regulators from gene expression data , 2003, Nature Genetics.

[22]  David Maxwell Chickering,et al.  Learning Bayesian Networks: The Combination of Knowledge and Statistical Data , 1994, Machine Learning.

[23]  S. Brunak,et al.  Mining electronic health records: towards better research applications and clinical care , 2012, Nature Reviews Genetics.

[24]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[25]  Hod Lipson,et al.  Distilling Free-Form Natural Laws from Experimental Data , 2009, Science.

[26]  Corrado Priami,et al.  Algorithmic systems biology , 2009, CACM.

[27]  Lior Pachter,et al.  Multiple-sequence functional annotation and the generalized hidden Markov phylogeny , 2004, Bioinform..

[28]  Jacob G Foster,et al.  Choosing experiments to accelerate collective discovery , 2015, Proceedings of the National Academy of Sciences.

[29]  J. Tenenbaum,et al.  Theory-based Bayesian models of inductive learning and reasoning , 2006, Trends in Cognitive Sciences.

[30]  Carsten Lutz,et al.  E-connections of abstract description systems , 2004, Artif. Intell..

[31]  Laura M. Haas The Power Behind the Throne: Information Integration in the Age of Data-Driven Discovery , 2015, SIGMOD Conference.

[32]  Adrien Richard,et al.  Application of formal methods to biological regulatory networks: extending Thomas' asynchronous logical approach with temporal logic. , 2004, Journal of theoretical biology.

[33]  I. Hacking,et al.  Representing and Intervening. , 1986 .

[34]  P. Langley,et al.  Computational Models of Scientific Discovery and Theory Formation , 1990 .

[35]  Geoffrey C. Fox,et al.  Examining the Challenges of Scientific Workflows , 2007, Computer.

[36]  Bruce G Buchanan,et al.  Automating Science , 2009, Science.

[37]  Robert L. Goldstone,et al.  The simultaneous evolution of author and paper networks , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[38]  A. Budden,et al.  Big data and the future of ecology , 2013 .

[39]  Jieming Zhu,et al.  Automated Discovery in a Chemistry Laboratory , 1990, AAAI.

[40]  Kazumi Saito,et al.  Computational Discovery of Communicable Scientific Knowledge , 2002 .

[41]  Alon Y. Halevy,et al.  Principles of Data Integration , 2012 .

[42]  Lena Osterhagen What Is This Thing Called Science , 2016 .

[43]  Anthony Hunter,et al.  Elements of Argumentation , 2007, ECSQARU.

[44]  Herbert A. Simon,et al.  Scientific discovery: compulalional explorations of the creative process , 1987 .

[45]  Amit P. Sheth,et al.  A graph-based recovery and decomposition of Swanson's hypothesis using semantic predications , 2013, J. Biomed. Informatics.

[46]  R. Persaud Philosophy of science , 1992, The Lancet.

[47]  Vasant Honavar,et al.  The Promise and Potential of Big Data: A Case for Discovery Informatics , 2014 .

[48]  Carole A. Goble,et al.  Taverna: a tool for building and running workflows of services , 2006, Nucleic Acids Res..

[49]  Tom M. Mitchell,et al.  Machine learning classifiers and fMRI: A tutorial overview , 2009, NeuroImage.

[50]  Saso Dzeroski,et al.  Computational Discovery of Scientific Knowledge , 2007, Computational Discovery of Scientific Knowledge.

[51]  James A. Hendler,et al.  The Semantic Web" in Scientific American , 2001 .

[52]  Elizabeth Bradley,et al.  Reasoning about nonlinear system identification , 2001, Artif. Intell..

[53]  Claire David,et al.  PODS 2010: PROCEEDINGS OF THE TWENTY-NINTH ACM SIGMOD-SIGACT-SIGART SYMPOSIUM ON PRINCIPLES OF DATABASE SYSTEMS , 2010, PODS 2010.

[54]  R. Bonney,et al.  Next Steps for Citizen Science , 2014, Science.

[55]  Vasant G Honavar,et al.  Computational prediction of protein interfaces: A review of data driven methods , 2015, FEBS letters.

[56]  Cosimo Laneve,et al.  Formal molecular biology , 2004, Theor. Comput. Sci..

[57]  Pierre Baldi,et al.  Bioinformatics - the machine learning approach (2. ed.) , 2000 .

[58]  Luciano Serafini,et al.  Distributed Description Logics: Assimilating Information from Peer Sources , 2003, J. Data Semant..

[59]  Sebastián Ventura,et al.  Educational Data Mining: A Review of the State of the Art , 2010, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[60]  Michael J. Pazzani,et al.  Beyond Concise and Colorful: Learning Intelligible Rules , 1997, KDD.

[61]  Glenn Fung,et al.  Knowledge-Based Support Vector Machine Classifiers , 2002, NIPS.

[62]  Tony Hey,et al.  The Fourth Paradigm: Data-Intensive Scientific Discovery , 2009 .

[63]  Howard Greisdorf,et al.  Exploring Science: The Cognition and Development of Discovery Processes , 2003, J. Documentation.

[64]  K. Cohen,et al.  Biomedical language processing: what's beyond PubMed? , 2006, Molecular cell.

[65]  Arie Rip,et al.  The Computer Revolution in Science: Steps Towards the Realization of Computer-Supported Discovery Environments , 1997, Artif. Intell..

[66]  Michael I. Jordan,et al.  Machine learning: Trends, perspectives, and prospects , 2015, Science.

[67]  Allen L. Wold,et al.  The Science of Artificial Intelligence , 1984 .

[68]  D. C. Englebart,et al.  Augmenting human intellect: a conceptual framework , 1962 .

[69]  William W. Cohen Compiling prior knowledge into an explicit basis , 1992, ICML 1992.

[70]  Pat Langley,et al.  Data-Driven Discovery of Physical Laws , 1981, Cogn. Sci..

[71]  Tim Roughgarden,et al.  Algorithmic Game Theory , 2007 .

[72]  Jon M. Kleinberg,et al.  The small-world phenomenon: an algorithmic perspective , 2000, STOC '00.

[73]  T. Kuhn The structure of scientific revolutions, 3rd ed. , 1996 .

[74]  Vasant Honavar,et al.  Transportability from Multiple Environments with Limited Experiments , 2013, NIPS.

[75]  Xiaolong Zhang,et al.  CollabSeer: a search engine for collaboration discovery , 2011, JCDL '11.

[76]  Ian Horrocks,et al.  Modular Reuse of Ontologies: Theory and Practice , 2008, J. Artif. Intell. Res..

[77]  Joshua Lederberg,et al.  Applications of Artificial Intelligence for Organic Chemistry: The DENDRAL Project , 1980 .

[78]  Neil R. Smalheiser,et al.  Artificial Intelligence An interactive system for finding complementary literatures : a stimulus to scientific discovery , 1995 .

[79]  Michael Szell,et al.  A century of physics , 2015, Nature Physics.

[80]  Neil R. Smalheiser,et al.  Literature-based discovery: Beyond the ABCs , 2012, J. Assoc. Inf. Sci. Technol..