An Integration-Based Conceptual Framework for Scientific Information Analysis

The era of big data brings both opportunities and challenges to scientific information analysis (SIA) and intelligent information services performed at the National Science Library, Chinese Academy of Sciences (NSLC). We are in urgent need of developing a new SIA framework to expedite big data acquisition and processing, and to improve the quality of information services. This paper describes the traditional SIA workflow currently applied at NSLC with a case study. It also reviews progresses on massive heterogeneous data integration, data management and analytics methods, and their applications. We then propose an integration-based conceptual framework for SIA through an examination of the limitations of current workflow. The new framework is characterized with the development of a Knowledge Resources Integration System (KRIS) that can store, organize, process, and visualize heterogeneous data. We explain the functions and characteristics of the proposed framework and strategies to implement a web-based data warehouse system based on it. The paper concludes with a discussion of future research to implement KRIS for SIA.

[1]  Christopher De Sa,et al.  DeepDive: Declarative Knowledge Base Construction , 2016, SGMD.

[2]  Yunhao Liu,et al.  Big Data: A Survey , 2014, Mob. Networks Appl..

[3]  Tom Fawcett,et al.  Data Science and its Relationship to Big Data and Data-Driven Decision Making , 2013, Big Data.

[4]  Daniel J. Power,et al.  Using ‘Big Data’ for analytics and decision support , 2014, J. Decis. Syst..

[5]  Athanasios V. Vasilakos,et al.  Big data: From beginning to future , 2016, Int. J. Inf. Manag..

[6]  M. Hilbert,et al.  Big Data for Development: A Review of Promises and Challenges , 2016 .

[7]  Maurizio Lenzerini,et al.  Data integration for research and innovation policy: an Ontology-Based Data Management approach , 2015, Scientometrics.

[8]  Madian Khabsa,et al.  Scholarly big data information extraction and integration in the CiteSeerχ digital library , 2014, 2014 IEEE 30th International Conference on Data Engineering Workshops.

[9]  Tony Hey,et al.  The Fourth Paradigm: Data-Intensive Scientific Discovery , 2009 .

[10]  Yonggang Wen,et al.  Toward Scalable Systems for Big Data Analytics: A Technology Tutorial , 2014, IEEE Access.

[11]  Vasant Honavar,et al.  The Promise and Potential of Big Data: A Case for Discovery Informatics , 2014 .

[12]  Sungjoo Lee,et al.  Analysis of document-mining techniques and tools for technology intelligence: discovering knowledge from technical documents , 2012, Int. J. Technol. Manag..

[13]  Thomas J. Hacker,et al.  Current trends in predictive analytics of big data , 2014, Int. J. Big Data Intell..

[14]  Tonghai Jiang,et al.  A Novel Data Integration Framework Based on Unified Concept Model , 2017, IEEE Access.

[15]  James Hendler,et al.  Data Integration for Heterogenous Datasets , 2014, Big Data.

[16]  Dikshant Shahi Apache Solr , 2015, Apress.

[17]  Jeff Z. Pan,et al.  Exploiting Linked Data and Knowledge Graphs in Large Organisations , 2017 .

[18]  Meng Xiaofeng and Du Zhijuan Research on the Big Data Fusion: Issues and Challenges , 2016 .

[19]  Ehtisham Zaidi,et al.  Magic Quadrant for Data Integration Tools , 2010 .

[20]  Ines Rossak Einstieg in Talend Open Studio for Data Integration , 2013 .