Suitability of Optical Character Recognition (OCR) for Multi-domain Model Management

The development of systems following model-driven engineering can include models from different domains. For example, to develop a mechatronic component one might need to combine expertise about mechanics, electronics, and software. Although these models belong to different domains, the changes in one model can affect other models causing inconsistencies in the entire system. There are, however, a limited amount of tools that support management of models from different domains. These models are created using different modeling notations and it is not plausible to use a multitude of parsers geared towards each and every modeling notation. Therefore, to ensure maintenance of multi-domain systems, we need a uniform approach that would be independent from the peculiarities of the notation. Meaning that such a uniform approach can only be based on something which is present in all those models, i.e., text, boxes, and lines. In this study we investigate the suitability of optical character recognition (OCR) as a basis for such a uniformed approach. We select graphical models from various domains that typically combine textual and graphical elements, and we focus on text-recognition without looking for additional shapes. We analyzed the performance of Google Cloud Vision and Microsoft Cognitive Services, two off-the-shelf OCR services. Google Cloud Vision performed better than Microsoft Cognitive Services being able to detect text of 70% of model elements. Errors made by Google Cloud Vision are due to absence of support for text common in engineering formulas, e.g., Greek letters, equations, and subscripts, as well as text typeset on multiple lines. We believe that once these shortcomings are addressed, OCR can become a crucial technology supporting multi-domain model management.

[1]  Claes Wohlin,et al.  Experimentation in Software Engineering , 2000, The Kluwer International Series in Software Engineering.

[2]  C. Mello,et al.  A Comparative Study on OCR Tools , 1999 .

[3]  Colin Atkinson Orthographic Software Modelling: A Novel Approach to View-Based Software Engineering , 2010, ECMFA.

[4]  Sayf Rashid Automatic Classification of UML Sequence Diagrams from Images , 2019 .

[5]  Jeffrey G. Gray,et al.  A model-driven approach to support engineering changes in industrial robotics software , 2012, MODELS'12.

[6]  M. C. Parikh,et al.  A Review on Optical Character Recognition Techniques , 2017 .

[7]  Aloysius K. Mok,et al.  Stability and Performance Analysis of Time-Delayed Actuator Control Systems , 2016 .

[8]  Arsénio Reis,et al.  Using Online Artificial Vision Services to Assist the Blind - an Assessment of Microsoft Cognitive Services and Google Cloud Vision , 2018, WorldCIST.

[9]  Valentin Moreno,et al.  Automatic classification of web images as UML diagrams , 2016, CERI.

[10]  Michel R. V. Chaudron,et al.  Img2UML: A System for Extracting UML Models from Images , 2013, 2013 39th Euromicro Conference on Software Engineering and Advanced Applications.

[11]  Michel R. V. Chaudron,et al.  Extracting UML models from images , 2013, 2013 5th International Conference on Computer Science and Information Technology.

[12]  Herbert F. Schantz,et al.  History of OCR, Optical Character Recognition , 1982 .

[13]  Dr. S. Vijayarani,et al.  Performance Comparison of OCR Tools , 2015, International Journal of UbiComp.

[14]  Holger Giese,et al.  On the Complex Nature of MDE Evolution , 2013, MoDELS.

[15]  Efrén Gorrostieta Hurtado,et al.  A Fully Sensorized Cooperative Robotic System for Surgical Interventions , 2012, Sensors.

[16]  Michel R. V. Chaudron,et al.  Automatic Classification of UML Class Diagrams from Images , 2014, 2014 21st Asia-Pacific Software Engineering Conference.

[17]  Hans Vangheluwe,et al.  The FTG+PM framework for multi-paradigm modelling: an automotive case study , 2012, MPM '12.

[18]  Agus Budiyono,et al.  Reconfigurable Intelligent Control Architecture of a Small-Scale Unmanned Helicopter , 2014 .

[19]  Markus Völter,et al.  Model-Driven Software Development: Technology, Engineering, Management , 2006 .

[20]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .