Solving problems of research information heterogeneity during integration - using the European CERIF and German RCD standards as examples

Integrating data from a variety of heterogeneous internal and external data sources (e.g. CERIF and RCD data models with different modeling languages) in a federated database system such as “Research Information Management System (RIMS)” is becoming more challenging for (inter-)national universities and research institutions. Data quality is an important factor for successful integration and interpretation of research information and interoperability of various independent information systems. Before the data is loaded into RIMS, they should be reviewed during data integration process to resolve conflicts between the different data sources and clean the data quality issues. Poor data quality leads to distortion in data presentation, and thus to erroneous basis for decisions. It is ultimately a cost for scientific institutions and it starts with integrating research information into the RIMS. Therefore, the investment in the topic of information integration makes sense insofar, the achievement of a high data quality is of primary importance. This paper presents methods, processes and techniques of information integration in the context of research information management systems. In order to ensure the quality of research information in an institutions data sources during its integration into the RIMS. Numerous attempts have already been done by universities and research institutions to create techniques and solutions for this need.

[1]  Gunter Saake,et al.  Investigations of Concept Development to Improve Data Quality in Research Information Systems (Untersuchungen zur Konzeptentwicklung für eine Verbesserung der Datenqualität in Forschungsinformationssystemen) , 2018, Grundlagen von Datenbanken.

[2]  C. Batini,et al.  A comparative analysis of methodologies for database schema integration , 1986, CSUR.

[3]  Zahida Hussain Readers , 2020, World Authorship.

[4]  Gunter Saake,et al.  Data Quality Measures and Data Cleansing for Research Information Systems , 2019, ArXiv.

[5]  G. Alagic,et al.  #p , 2019, Quantum Inf. Comput..

[6]  Gunter Saake,et al.  ETL Best Practices for Data Quality Checks in RIS Databases , 2019, Informatics.

[7]  Stefan Hornbostel,et al.  The Research Core Dataset for the German science system: developing standards for an integrated management of research information , 2016, Scientometrics.

[8]  Miss A.O. Penney (b) , 1974, The New Yale Book of Quotations.

[9]  Paul Thompson,et al.  Names: A New Frontier in Text Mining , 2003, ISI.

[10]  Christine L. Borgman,et al.  Getty's Synoname™ and its cousins: A survey of applications of personal name‐matching algorithms , 1992 .

[11]  Alexis-Michel Mugabushaka,et al.  Information systems of research funding agencies in the "era of the Big Data". The case study of the Research Information System of the European Research Council , 2012, CRIS.

[12]  Peter Christen,et al.  A Comparison of Personal Name Matching: Techniques and Practical Issues , 2006, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06).

[13]  Britta Ebeling,et al.  Integrating research information into a software for higher education administration - benefits for data quality and accessibility , 2012, CRIS.

[14]  Joachim Schöpfel,et al.  Quality Issues of CRIS Data: An Exploratory Investigation with Universities from Twelve Countries , 2019, Publ..

[15]  Mohammad Abuosba,et al.  Improving the data quality in the research information systems , 2019, ArXiv.

[16]  Stefan Conrad,et al.  Föderierte Datenbanksysteme - Konzepte der Datenintegration , 1997 .

[17]  Gunter Saake,et al.  Data Warehouse Technologien , 2012 .

[18]  Stefano Spaccapietra,et al.  Model independent assertions for integration of heterogeneous schemas , 1992, The VLDB Journal.

[19]  Jyrki Ilva Towards Reliable Data - Counting the Finnish Open Access Publications , 2016, CRIS.

[20]  Gunter Saake,et al.  Analyzing data quality issues in research information systems via data profiling , 2018, Int. J. Inf. Manag..

[21]  Hans-Georg Kemper,et al.  Business Intelligence — Grundlagen und praktische Anwendungen , 2004 .

[22]  Dong Joon Lee,et al.  Readers, Personal Record Managers, and Community Members: An Exploratory Study of Researchers' Participation in Online Research Information Management Systems , 2017 .

[23]  Maximilian Stempfhuber Information quality in the context of CRIS and CERIF , 2008 .

[24]  Gunter Saake,et al.  Data measurement in research information systems: metrics for the evaluation of data quality , 2018, Scientometrics.

[25]  Matthias Jarke,et al.  Information Integration in Research Information Systems , 2014, CRIS.

[26]  Keith G. Jeffery,et al.  Research information management: the CERIF approach , 2014, Int. J. Metadata Semant. Ontologies.