论文信息 - NEMA: Automatic Integration of Large Network Management Databases

NEMA: Automatic Integration of Large Network Management Databases

Network management, whether for malfunction analysis, failure prediction, performance monitoring and improvement, generally involves large amounts of data from different sources. To effectively integrate and manage these sources, automatically finding semantic matches among their schemas or ontologies is crucial. Existing approaches on database matching mainly fall into two categories. One focuses on the schema-level matching based on schema properties such as field names, data types, constraints and schema structures. Network management databases contain massive tables (e.g., network products, incidents, security alert and logs) from different departments and groups with nonuniform field names and schema characteristics. It is not reliable to match them by those schema properties. The other category is based on the instance-level matching using general string similarity techniques, which are not applicable for the matching of large network management databases. In this paper, we develop a matching technique for large NEtwork MAnagement databases (NEMA) deploying instance-level matching for effective data integration and connection. We design matching metrics and scores for both numerical and non-numerical fields and propose algorithms for matching these fields. The effectiveness and efficiency of NEMA are evaluated by conducting experiments based on ground truth field pairs in large network management databases. Our measurement on large databases with 1,458 fields, each of which contains over 10 million records, reveals that the accuracies of NEMA are up to 95%. It achieves 2%-10% higher accuracy and 5x-14x speedup over baseline methods.

[1] Dorgival O. Guedes,et al. Network management through graphs in Software Defined Networks , 2014, 10th International Conference on Network and Service Management (CNSM) and Workshop.

[2] Daniel S. Himmelstein,et al. Heterogeneous Network Edge Prediction: A Data Integration Approach to Prioritize Disease-Associated Genes , 2014, bioRxiv.

[3] Anand Rajaraman,et al. Mining of Massive Datasets , 2011 .

[4] Anupam Bhattacharjee,et al. OntoMatch: A monotonically improving schema matching system for autonomous data integration , 2009, 2009 IEEE International Conference on Information Reuse & Integration.

[5] Michael Stonebraker,et al. Data Integration: The Current Status and the Way Forward , 2018, IEEE Data Eng. Bull..

[6] Yannis Velegrakis,et al. Beyond Macrobenchmarks: Microbenchmark-based Graph Database Evaluation , 2018, Proc. VLDB Endow..

[7] Din J. Wasem,et al. Mining of Massive Datasets , 2014 .

[8] Keisha B Peak,et al. Database Systems: Design, Implementation, and Management , 2014 .

[9] Fang Zhang,et al. Mining schema matching between heterogeneous databases , 2012, 2012 2nd International Conference on Consumer Electronics, Communications and Networks (CECNet).

[10] Divesh Srivastava,et al. Big data integration , 2013, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[11] Peter van Heusden,et al. COMBAT-TB-NeoDB: fostering tuberculosis research through integrative analysis using graph database technologies , 2020, Bioinform..

[12] Erhard Rahm,et al. Similarity flooding: a versatile graph matching algorithm and its application to schema matching , 2002, Proceedings 18th International Conference on Data Engineering.

[13] Li Qian,et al. Sample-driven schema mapping , 2012, SIGMOD Conference.

[14] Amine Mhedhbi,et al. The Ubiquity of Large Graphs and Surprising Challenges of Graph Processing , 2017 .

[15] Silvana Castano,et al. Matching Techniques for Data Integration and Exploration: From Databases to Big Data , 2018, A Comprehensive Guide Through the Italian Database Research.

[16] Babak Akhgar,et al. Green Information Technology: A Sustainable Approach , 2015 .

[17] Umberto Straccia,et al. Information retrieval and machine learning for probabilistic schema matching , 2005, CIKM '05.

[18] Pascal Hitzler,et al. String Similarity Metrics for Ontology Alignment , 2013, SEMWEB.

[19] Song Guo,et al. Big Data Analytics for Emergency Communication Networks: A Survey , 2016, IEEE Communications Surveys & Tutorials.

[20] Shazia Wasim Sadiq,et al. Sampling dirty data for matching attributes , 2010, SIGMOD Conference.

[21] Xiangliang Zhang,et al. The Interaction Between Schema Matching and Record Matching in Data Integration , 2017, IEEE Transactions on Knowledge and Data Engineering.

[22] Dae-il Jang,et al. History Management for Network Information of IoT Devices , 2017 .

[23] Hamidah Ibrahim,et al. Instance based Matching using Regular Expression , 2012, ANT/MobiWIS.

[24] Jérôme Euzenat,et al. Ontology Matching: State of the Art and Future Challenges , 2013, IEEE Transactions on Knowledge and Data Engineering.

[25] Ekaterini Ioannou,et al. EMBench++: Data for a thorough benchmarking of matching-related methods , 2019, Semantic Web.

[26] Erhard Rahm,et al. Schema and ontology matching with COMA++ , 2005, SIGMOD '05.

[27] Michael Milford,et al. Meaningful maps with object-oriented semantic mapping , 2016, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[28] Maurizio Lenzerini,et al. The advantages of an Ontology-Based Data Management approach: openness, interoperability and data quality , 2016, Scientometrics.

[29] David Gomez-Cabrero,et al. Data integration in the era of omics: current and future challenges , 2014, BMC Systems Biology.

[30] Chen Chen,et al. BigGorilla: An Open-Source Ecosystem for Data Preparation and Integration , 2018, IEEE Data Eng. Bull..

[31] Justin J. Miller,et al. Graph Database Applications and Concepts with Neo4j , 2013 .

[32] Charu C. Aggarwal,et al. Graph Data Management and Mining: A Survey of Algorithms and Applications , 2010, Managing and Mining Graph Data.

[33] Bin Gao,et al. An Effective Content-Based Schema Matching Algorithm , 2008, 2008 International Seminar on Future Information Technology and Management Engineering.

[34] AnHai Doan,et al. Corpus-based schema matching , 2005, 21st International Conference on Data Engineering (ICDE'05).

[35] Erhard Rahm,et al. A survey of approaches to automatic schema matching , 2001, The VLDB Journal.

[36] Peter N. Yianilos,et al. Learning String-Edit Distance , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[37] Jérôme Euzenat,et al. A Survey of Schema-Based Matching Approaches , 2005, J. Data Semant..

[38] Sabine Maßmann,et al. Instance Matching with COMA++ , 2007, BTW Workshops.

[39] Peter Willett,et al. Automatic Spelling Correction Using a Trigram Similarity Measure , 1983, Inf. Process. Manag..