A flexible architecture for data mining from heterogeneous data sources in automated production systems

Data heterogeneity and proprietary interfaces present a major challenge for big data analytics. The data generated from a multitude of sources has to be aggregated and integrated first before being evaluated. To overcome this, an automated integration of this data and its provisioning via defined interfaces in a generic data format could greatly reduce the effort for an efficient collection and preparation of data for data analysis in automated production systems. Besides, the sharing of specific data with customers and suppliers, as well as near real-time processing of data can boost the information gain from analysis. Existing approaches for automatic data integration lack the fulfillment of all these requirements. On this basis, a flexible architecture is proposed, which simplifies data integration, handling and sharing of data over organizational borders. Special focus is put on the ability to process near real-time data which is common in the field of automated production systems. An evaluation with technical experts from the field of automation was carried out by adapting the generic concept for specific use cases. The evaluation showed that the proposed architecture could overcome the disadvantages of current systems and reduce the effort spent on data integration. Therefore, the proposed architecture can be an enabler for automated data analysis of distributed data from sources with heterogeneous data formats in automated production systems.

[1]  Maurizio Lenzerini,et al.  Data integration: a theoretical perspective , 2002, PODS.

[2]  Silvana Castano,et al.  Global Viewing of Heterogeneous Data Sources , 2001, IEEE Trans. Knowl. Data Eng..

[3]  Muhammad Younas,et al.  Emerging trends and technologies in big data processing , 2015, Concurr. Comput. Pract. Exp..

[4]  Xindong Wu,et al.  Data mining with big data , 2014, IEEE Transactions on Knowledge and Data Engineering.

[5]  Wei Fan,et al.  Mining big data: current status, and forecast to the future , 2013, SKDD.

[6]  Birgit Vogel-Heuser,et al.  Data integration in manufacturing industry: Model-based integration of data distributed from ERP to PLC , 2015, 2015 IEEE 13th International Conference on Industrial Informatics (INDIN).

[7]  Marek Obitko,et al.  Understanding Data Heterogeneity in the Context of Cyber-Physical Systems Integration , 2017, IEEE Transactions on Industrial Informatics.

[8]  Birgit Vogel-Heuser,et al.  Guest Editorial Industry 4.0-Prerequisites and Visions , 2016, IEEE Trans Autom. Sci. Eng..

[9]  TU MarioHermann Design Principles for Industrie 4 . 0 Scenarios , 2015 .

[10]  Richard Mordinyi,et al.  Extending mechatronic objects for automation systems engineering in heterogeneous engineering environments , 2012, Proceedings of 2012 IEEE 17th International Conference on Emerging Technologies & Factory Automation (ETFA 2012).

[11]  MAJ Paul B. Lester,et al.  Data Integration , 2014, Encyclopedia of Social Network Analysis and Mining.

[12]  Ó. Lázaro,et al.  Leveraging IoT Interoperability for Enhanced Business Process in Smart, Digital and Virtual Factories , 2015 .

[13]  Luis Garcés-Erice Building an Enterprise Service Bus for Real-Time SOA: A Messaging Middleware Stack , 2009, 2009 33rd Annual IEEE International Computer Software and Applications Conference.

[14]  Lakshmi S. Iyer,et al.  Knowledge warehouse: an architectural integration of knowledge management, decision support, artificial intelligence and data warehousing , 2002, Decis. Support Syst..

[15]  Diego Calvanese,et al.  Data Integration in Data Warehousing (Keynote Address) , 2001, CAiSE Workshops.

[16]  FanWei,et al.  Mining big data , 2013 .

[17]  Nathan Marz,et al.  Big Data: Principles and best practices of scalable realtime data systems , 2015 .

[18]  Edmon Begoli,et al.  Design Principles for Effective Knowledge Discovery from Big Data , 2012, 2012 Joint Working IEEE/IFIP Conference on Software Architecture and European Conference on Software Architecture.

[19]  Richard Hull,et al.  Managing semantic heterogeneity in databases: a theoretical prospective , 1997, PODS.

[20]  N. B. Anuar,et al.  The rise of "big data" on cloud computing: Review and open research issues , 2015, Inf. Syst..