New Horizons for a Data-Driven Economy

In this book readers will find technological discussions on the existing and emerging technologies across the different stages of the big data value chain. They will learn about legal aspects of big data, the social impact, and about education needs and requirements. And they will discover the business perspective and how big data technology can be exploited to deliver value within different sectors of the economy. The book is structured in four parts: Part I The Big Data Opportunity explores the value potential of big data with a particular focus on the European context. It also describes the legal, business and social dimensions that need to be addressed, and briefly introduces the European Commissions BIG project. Part II The Big Data Value Chain details the complete big data lifecycle from a technical point of view, ranging from data acquisition, analysis, curation and storage, to data usage and exploitation. Next, Part III Usage and Exploitation of Big Data illustrates the value creation possibilities of big data applications in various sectors, including industry, healthcare, finance, energy, media and public services. Finally, Part IV A Roadmap for Big Data Research identifies and prioritizes the cross-sectorial requirements for big data research, and outlines the most urgent and challenging technological, economic, political and societal issues for big data in Europe. This compendium summarizes more than two years of work performed by a leading group of major European research centers and industries in the context of the BIG project. It brings together research findings, forecasts and estimates related to this challenging technological context that is becoming the major axis of the new digitally transformed business environment.

[1]  Andy Hopper,et al.  HadoopProv: Towards Provenance as a First Class Citizen in MapReduce , 2013, TaPP.

[2]  Jennifer Chu-Carroll,et al.  Building Watson: An Overview of the DeepQA Project , 2010, AI Mag..

[3]  Ehud Gudes,et al.  Security Issues in NoSQL Databases , 2011, 2011IEEE 10th International Conference on Trust, Security and Privacy in Computing and Communications.

[4]  Stefan Decker,et al.  Secure Manipulation of Linked Data , 2013, SEMWEB.

[5]  Duncan Stewart,et al.  Technology, media and telecommunications predictions 2018 , 2017 .

[6]  Antony J. Williams,et al.  ChemSpider:: An Online Chemical Information Resource , 2010 .

[7]  Adegboyega K. Ojo,et al.  A Tale of Open Data Innovations in Five Smart Cities , 2015, 2015 48th Hawaii International Conference on System Sciences.

[8]  Edith Law,et al.  Input-agreement: a new mechanism for collecting data using human computation games , 2009, CHI.

[9]  Daniel A. Keim,et al.  Mastering the Information Age - Solving Problems with Visual Analytics , 2010 .

[10]  Matthew O. Ward,et al.  Interactive Data Visualization - Foundations, Techniques, and Applications , 2010 .

[11]  David Baker,et al.  Crystal structure of a monomeric retroviral protease solved by protein folding game players , 2012, Nature Structural &Molecular Biology.

[12]  Jérôme Euzenat,et al.  A Survey of Schema-Based Matching Approaches , 2005, J. Data Semant..

[13]  Fatemeh Ahmadi Zeleti,et al.  Business Models for the Open Data Industry: Characterization and Analysis of Emerging Models , 2014 .

[14]  Yao Zheng,et al.  Scalable and Secure Sharing of Personal Health Records in Cloud Computing Using Attribute-Based Encryption , 2019, IEEE Transactions on Parallel and Distributed Systems.

[15]  James Cheney,et al.  Causality and the Semantics of Provenance , 2010, DCM.

[16]  Edward Curry,et al.  Thematic event processing , 2014, Middleware.

[17]  Peter C. Evans Pushing the Boundaries of Minds and Machines , 2012 .

[18]  Helmut Krcmar,et al.  Big Data , 2014, Wirtschaftsinf..

[19]  Si’en Chen,et al.  Analytics: The real-world use of big data in financial services studying with judge system events , 2016, Journal of Shanghai Jiaotong University (Science).

[20]  C. Ma,et al.  Health-Care Payment Systems: Cost and Quality Incentives—Reply , 1998 .

[21]  Jennifer Widom,et al.  Provenance for Generalized Map and Reduce Workflows , 2011, CIDR.

[22]  Ben Shneiderman,et al.  The eyes have it: a task by data type taxonomy for information visualizations , 1996, Proceedings 1996 IEEE Symposium on Visual Languages.

[23]  Sunghwan Sohn,et al.  Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications , 2010, J. Am. Medical Informatics Assoc..

[24]  Edward Curry,et al.  Linking building data in the cloud: Integrating cross-domain building data using linked data , 2013, Adv. Eng. Informatics.

[25]  Edward Curry,et al.  The Role of Community-Driven Data Curation for Enterprises , 2010, Linking Enterprise Data.

[26]  Nicolas Le Novère,et al.  MIRIAM Resources: tools to generate and resolve robust cross-references in Systems Biology , 2007, BMC Systems Biology.

[27]  Carol Friedman,et al.  Deriving a probabilistic syntacto-semantic grammar for biomedicine based on domain-specific terminologies , 2011, J. Biomed. Informatics.

[28]  Panagiotis G. Ipeirotis Analyzing the Amazon Mechanical Turk marketplace , 2010, XRDS.

[29]  Alon Y. Halevy,et al.  Crowdsourcing systems on the World-Wide Web , 2011, Commun. ACM.

[30]  Tong Wang,et al.  Detecting Patterns of Crime with Series Finder , 2013, AAAI.

[31]  Z. Popovic,et al.  Increased Diels-Alderase activity through backbone remodeling guided by Foldit players , 2012, Nature Biotechnology.

[32]  Benjamin M. Good,et al.  Games with a scientific purpose , 2011, Genome Biology.

[33]  Holger Ziekow,et al.  The potential of smart home sensors in forecasting household electricity demand , 2013, 2013 IEEE International Conference on Smart Grid Communications (SmartGridComm).

[34]  Edward Curry,et al.  XBRL and open data for global financial ecosystems: A linked data approach , 2012, Int. J. Account. Inf. Syst..

[35]  Seán O'Riain,et al.  Querying Linked Data Using Semantic Relatedness: A Vocabulary Independent Approach , 2011, NLDB.

[36]  Ray P. Norris How to Make the Dream Come True: The Astronomers' Data Manifesto , 2007, Data Sci. J..

[37]  Craig A. Knoblock,et al.  Building Mashups by Demonstration , 2011, TWEB.

[38]  Paul Buitelaar,et al.  RelExt: A Tool for Relation Extraction from Text in Ontology Extension , 2005, SEMWEB.

[39]  Boris Glavic Big Data Provenance: Challenges and Implications for Benchmarking , 2012, WBDB.

[40]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[41]  Anne E. Trefethen,et al.  UK e-Science Programme: Next Generation Grid Applications , 2004, Int. J. High Perform. Comput. Appl..

[42]  Mark Hedges,et al.  Sheer curation for experimental data and provenance , 2012, JCDL '12.

[43]  J. Manyika Big data: The next frontier for innovation, competition, and productivity , 2011 .

[44]  P. Sarbanes,et al.  Sarbanes-Oxley Act of 2002 , 2002 .

[45]  David Maier,et al.  From databases to dataspaces: a new abstraction for information management , 2005, SGMD.

[46]  Aaron Roth,et al.  The Algorithmic Foundations of Differential Privacy , 2014, Found. Trends Theor. Comput. Sci..

[47]  Linda C. Smith,et al.  An Educational Program on Data Curation , 2007 .

[48]  Kristin E. Lauter,et al.  Cryptographic Cloud Storage , 2010, Financial Cryptography Workshops.

[49]  Robert Neches,et al.  Access Control Policies for Semantic Networks , 2009, 2009 IEEE International Symposium on Policies for Distributed Systems and Networks.

[50]  Diane M. Strong,et al.  Beyond Accuracy: What Data Quality Means to Data Consumers , 1996, J. Manag. Inf. Syst..

[51]  Erhard Rahm,et al.  Similarity flooding: a versatile graph matching algorithm and its application to schema matching , 2002, Proceedings 18th International Conference on Data Engineering.

[52]  Walunj Pratap,et al.  Survey of Attribute Based Encryption Schemes , 2017 .

[53]  Elias Bareinboim,et al.  Transportability of Causal and Statistical Relations: A Formal Approach , 2011, 2011 IEEE 11th International Conference on Data Mining Workshops.

[54]  André Freitas,et al.  Natural language queries over heterogeneous linked data graphs: a distributional-compositional semantics approach , 2014, IUI.

[55]  Abraham Bernstein,et al.  How Useful Are Natural Language Interfaces to the Semantic Web for Casual End-Users? , 2007, ISWC/ASWC.

[56]  Kasper Hornbæk,et al.  Subjunctive interfaces: Extending applications to support parallel setup, viewing and control of alternative scenarios , 2008, TCHI.

[57]  Jun Zhao,et al.  Collective entity linking in web text: a graph-based method , 2011, SIGIR.

[58]  Seán O'Riain,et al.  A Semantic Best-Effort Approach for Extracting Structured Discourse Graphs from Wikipedia , 2012, WoLE@ISWC.

[59]  Scott Shenker,et al.  Shark: SQL and rich analytics at scale , 2012, SIGMOD '13.

[60]  Pete Wyckoff,et al.  Hive - A Warehousing Solution Over a Map-Reduce Framework , 2009, Proc. VLDB Endow..

[61]  H. A. de Vries,et al.  The business value of big data , 2013 .

[62]  Li Qin,et al.  Concept-level access control for the Semantic Web , 2003, XMLSEC '03.

[63]  Henry Lieberman,et al.  Watch what I do: programming by demonstration , 1993 .

[64]  Herbert Burkert,et al.  Some Preliminary Comments on the DIRECTIVE 95/46/EC OF THE EUROPEAN PARLIAMENT AND OF THE COUNCIL of 24 October 1995 on the protection of individuals with regard to the processing of personal data and on the free movement of such data. , 1996 .

[65]  Alessandro Acquisti,et al.  Predicting Social Security numbers from public data , 2009, Proceedings of the National Academy of Sciences.

[66]  Amit P. Sheth,et al.  Changing Focus on Interoperability in Information Systems:From System, Syntax, Structure to Semantics , 1999 .

[67]  J. Koomey Worldwide electricity used in data centers , 2008 .

[68]  C. Johnman,et al.  Big data! Big deal? , 2015, Public health.

[69]  Hugh Glaser,et al.  Linked Open Government Data: Lessons from Data.gov.uk , 2012, IEEE Intelligent Systems.