Efficient model similarity estimation with robust hashing

As model-driven engineering (MDE) is increasingly adopted in complex industrial scenarios, modeling artefacts become a key and strategic asset for companies. As such, any MDE ecosystem must provide mechanisms to protect and exploit them. Current approaches depend on the calculation of the relative similarity among pairs of models. Unfortunately, model similarity calculation mechanisms are computationally expensive which prevents their use in large repositories or very large models. In this sense, this paper explores the adaptation of the robust hashing technique to the MDE domain as an efficient estimation method for model similarity. Indeed, robust hashing algorithms (i.e., hashing algorithms that generate similar outputs from similar input data) have proved useful as a key building block in intellectual property protection, authenticity assessment and fast comparison and retrieval solutions for different application domains. We present a detailed method for the generation of robust hashes for different types of models. Our approach is based on the translation to the MDE domain of diverse techniques such as summary extraction, minhash generation and locality-sensitive hash function families, originally developed for the comparison and classification of large datasets. We validate our approach with a prototype implementation and show that: (1) our approach can deal with any graph-based model representation; (2) a strong correlation exists between the similarity calculated directly on the robust hashes and a distance metric calculated over the original models; and (3) our approach scales well on large models and greatly reduces the time required to find similar models in large repositories.

[1]  C. De Vleeschouwer,et al.  Robust video hashing based on radial projections of key frames , 2005, IEEE Transactions on Signal Processing.

[2]  Martin Wattenberg,et al.  A fuzzy commitment scheme , 1999, CCS '99.

[3]  Dimitrios S. Kolovos,et al.  Establishing Correspondences between Models with the Epsilon Comparison Language , 2009, ECMDA-FA.

[4]  Adel Ferdjoukh,et al.  Measurement and Generation of Diversity and Meaningfulness in Model Driven Engineering , 2018 .

[5]  Min Wu,et al.  Robust and secure image hashing , 2006, IEEE Transactions on Information Forensics and Security.

[6]  Jos'e Antonio Antonio Hern'andez L'opez,et al.  MAR: a structure-based search engine for models , 2020, MoDELS.

[7]  Clémentine Nebut,et al.  Metamodel Matching for Automatic Model Transformation Generation , 2008, MoDELS.

[8]  Robert Bill,et al.  Domain-Specific Model Distance Measures , 2019, J. Object Technol..

[9]  Jiri Fridrich,et al.  Robust hash functions for digital watermarking , 2000, Proceedings International Conference on Information Technology: Coding and Computing (Cat. No.PR00540).

[10]  Marsha Chechik,et al.  Splitting Models Using Information Retrieval and Model Crawling Techniques , 2014, FASE.

[11]  H. Feistel Cryptography and Computer Privacy , 1973 .

[12]  W. B. Cavnar,et al.  N-gram-based text categorization , 1994 .

[13]  Guido Wachsmuth,et al.  Metamodel Adaptation and Model Co-adaptation , 2007, ECOOP.

[14]  Andrei Z. Broder,et al.  On the resemblance and containment of documents , 1997, Proceedings. Compression and Complexity of SEQUENCES 1997 (Cat. No.97TB100171).

[15]  Benoît Combemale,et al.  Modeling model slicers , 2011, MODELS'11.

[16]  Jordi Cabot,et al.  On Watermarking for Collaborative Model-Driven Engineering , 2018, IEEE Access.

[17]  Christoph Seidl,et al.  Improving custom-tailored variability mining using outlier and cluster detection , 2018, Sci. Comput. Program..

[18]  Udo Kelter,et al.  Adaptability of model comparison tools , 2012, 2012 Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering.

[19]  Sébastien Gérard,et al.  Robust Hashing for Models , 2018, MoDELS.

[20]  Baris Coskun,et al.  Robust video hash extraction , 2004, SIU 2004.

[21]  Ricardo Neisse,et al.  A Blockchain-based Approach for Data Accountability and Provenance Tracking , 2017, ARES.

[22]  Florian Noyrit,et al.  Computer Assisted Integration of Domain-Specific Modeling Languages Using Text Analysis Techniques , 2013, MoDELS.

[23]  Maiquel de Brito,et al.  Instrumenting Accountability in MAS with Blockchain , 2017, CARe-MAS@PRIMA.

[24]  Salvador Martínez,et al.  Efficient plagiarism detection for software modeling assignments , 2020, Comput. Sci. Educ..

[25]  Michael Cochez Locality-Sensitive Hashing for Massive String-Based Ontology Matching , 2014, 2014 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT).

[26]  C. Nebut,et al.  Measuring Differences To Compare Sets Of Models And Improve Diversity In MDE , 2017, ICSEA 2017.

[27]  Ramarathnam Venkatesan,et al.  Robust image hashing , 2000, Proceedings 2000 International Conference on Image Processing (Cat. No.00CH37101).

[28]  Cody Kinneer,et al.  Dissimilarity Measures for Clustering Space Mission Architectures , 2018, MoDELS.

[29]  Zi Huang,et al.  Robust Hashing With Local Models for Approximate Similarity Search , 2014, IEEE Transactions on Cybernetics.

[30]  Kevin Lano,et al.  Slicing Techniques for UML Models , 2011, J. Object Technol..

[31]  Ki-Ryong Kwon,et al.  Robust 3D mesh model hashing based on feature object , 2012, Digit. Signal Process..

[32]  Mark van den Brand,et al.  Metamodel Clone Detection with SAMOS , 2019, BENEVOL.

[33]  Jim Steel,et al.  Metamodel-based Test Generation for Model Transformations: an Algorithm and a Tool , 2006, 2006 17th International Symposium on Software Reliability Engineering.

[34]  Oszkár Semeráth,et al.  Diversity of graph models and graph generators in mutation testing , 2019, International Journal on Software Tools for Technology Transfer.

[35]  Jean Bézivin,et al.  On the unification power of models , 2005, Software & Systems Modeling.

[36]  Ronald L. Rivest,et al.  The MD5 Message-Digest Algorithm , 1992, RFC.

[37]  Önder Babur,et al.  Using n-grams for the Automated Clustering of Structural Models , 2017, SOFSEM.

[38]  Jean Bézivin,et al.  ATL: A model transformation tool , 2008, Sci. Comput. Program..

[39]  Juri Di Rocco,et al.  MDEForge: an Extensible Web-Based Modeling Platform , 2014, CloudMDE@MoDELS.

[40]  Nils Reimers,et al.  Robust Hash Algorithms for Text , 2013, Communications and Multimedia Security.

[41]  Gabor Karsai,et al.  The Generic Modeling Environment , 2001 .

[42]  Richard F. Paige,et al.  Different models for model matching: An analysis of approaches to support model differencing , 2009, 2009 ICSE Workshop on Comparison and Versioning of Software Models.

[43]  Jean Bézivin,et al.  On the Need for Megamodels , 2004, OOPSLA 2004.

[44]  Markus Scheidgen,et al.  Reference representation techniques for large models , 2013, BigMDE '13.

[45]  A. Ben Hamza,et al.  Information-theoretic hashing of 3D objects using spectral graph theory , 2009, Expert Syst. Appl..

[46]  Ram Kumar Karsh,et al.  Robust image hashing using ring partition-PGNMF and local features , 2016, SpringerPlus.

[47]  Udo Kelter,et al.  A Formal Framework for Incremental Model Slicing , 2018, FASE.

[48]  Udo Kelter,et al.  Incrementally slicing editable submodels , 2017, 2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE).

[49]  Y. L. Liu,et al.  A Robust Image Hashing Algorithm Resistant Against Geometrical Attacks , 2013 .