Identification of microservices from monolithic applications through topic modelling

Microservices emerged as one of the most popular architectural patterns in the recent years given the increased need to scale, grow and flexibilize software projects accompanied by the growth in cloud computing and DevOps. Many software applications are being submitted to a process of migration from its monolithic architecture to a more modular, scalable and flexible architecture of microservices. This process is slow and, depending on the project's complexity, it may take months or even years to complete. This paper proposes a new approach on microservice identification by resorting to topic modelling in order to identify services according to domain terms. This approach in combination with clustering techniques produces a set of services based on the original software. The proposed methodology is implemented as an open-source tool for exploration of monolithic architectures and identification of microservices. A quantitative analysis using the state of the art metrics on independence of functionality and modularity of services was conducted on 200 open-source projects collected from GitHub. Cohesion at message and domain level metrics' showed medians of roughly 0.6. Interfaces per service exhibited a median of 1.5 with a compact interquartile range. Structural and conceptual modularity revealed medians of 0.2 and 0.4 respectively. Our first results are positive demonstrating beneficial identification of services due to overall metrics' results.

[1]  David Buttler,et al.  Exploring Topic Coherence over Many Models and Many Topics , 2012, EMNLP.

[2]  Sam Newman,et al.  Building microservices - designing fine-grained systems, 1st Edition , 2015 .

[3]  D. L. Parnas,et al.  On the criteria to be used in decomposing systems into modules , 1972, Software Pioneers.

[4]  Theo Lynn,et al.  Microservices migration patterns , 2018, Softw. Pract. Exp..

[5]  Pietro Liò,et al.  Towards real-time community detection in large networks. , 2008, Physical review. E, Statistical, nonlinear, and soft matter physics.

[6]  Xia Feng,et al.  Latent Dirichlet allocation (LDA) and topic modeling: models, applications, a survey , 2017, Multimedia Tools and Applications.

[7]  Keisuke Yano,et al.  Extracting Candidates of Microservices from Monolithic Application Code , 2018, 2018 25th Asia-Pacific Software Engineering Conference (APSEC).

[8]  E. Kandel,et al.  Proceedings of the National Academy of Sciences of the United States of America. Annual subject and author indexes. , 1990, Proceedings of the National Academy of Sciences of the United States of America.

[9]  Qinghua Zheng,et al.  Functionality-Oriented Microservice Extraction Based on Execution Trace Clustering , 2018, 2018 IEEE International Conference on Web Services (ICWS).

[10]  M E J Newman,et al.  Community structure in social and biological networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[11]  Kenneth E. Shirley,et al.  LDAvis: A method for visualizing and interpreting topics , 2014 .

[12]  Marco Tulio Valente,et al.  Understanding the Factors That Impact the Popularity of GitHub Repositories , 2016, 2016 IEEE International Conference on Software Maintenance and Evolution (ICSME).

[13]  S. Fortunato,et al.  Resolution limit in community detection , 2006, Proceedings of the National Academy of Sciences.

[14]  Jean-Loup Guillaume,et al.  Fast unfolding of communities in large networks , 2008, 0803.0476.

[15]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[16]  Yuming Zhou,et al.  What Are the Dominant Projects in the GitHub Python Ecosystem? , 2016, 2016 Third International Conference on Trustworthy Systems and their Applications (TSA).

[17]  Tracy Hall,et al.  How Effectively Is Defective Code Actually Tested?: An Analysis of JUnit Tests in Seven Open Source Systems , 2018, PROMISE.

[18]  Jean-Charles Delvenne,et al.  Random Walks, Markov Processes and the Multiscale Modular Organization of Complex Networks , 2014, IEEE Transactions on Network Science and Engineering.

[19]  Shanshan Li,et al.  From Monolith to Microservices: A Dataflow-Driven Approach , 2017, 2017 24th Asia-Pacific Software Engineering Conference (APSEC).

[20]  Keisuke Yano,et al.  SArF map: Visualizing software architecture from feature and layer viewpoints , 2013, 2013 21st International Conference on Program Comprehension (ICPC).

[21]  M E J Newman,et al.  Modularity and community structure in networks. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[22]  Michael Röder,et al.  Exploring the Space of Topic Coherence Measures , 2015, WSDM.

[23]  Keisuke Yano,et al.  Feature-gathering dependency-based software clustering using Dedication and Modularity , 2012, 2012 28th IEEE International Conference on Software Maintenance (ICSM).

[24]  Claus Pahl,et al.  Microservices: A Systematic Mapping Study , 2016, CLOSER.

[25]  Bogdan Dit,et al.  Feature location in source code: a taxonomy and survey , 2013, J. Softw. Evol. Process..

[26]  Bin Li,et al.  Clustering Classes in Packages for Program Comprehension , 2017, Sci. Program..

[27]  Poonam Bansal,et al.  Topic Modeling: A Comprehensive Review , 2018, EAI Endorsed Trans. Scalable Inf. Syst..

[28]  Jürgen Cito,et al.  Extraction of Microservices from Monolithic Software Architectures , 2017, 2017 IEEE International Conference on Web Services (ICWS).

[29]  Bin Li,et al.  Exploring topic models in software engineering data analysis: A survey , 2016, 2016 17th IEEE/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD).

[30]  Magnus C. Ohlsson,et al.  Experimentation in Software Engineering , 2000, The Kluwer International Series in Software Engineering.

[31]  Gabriele Bavota,et al.  Using Cohesion and Coupling for Software Remodularization , 2016, ACM Trans. Softw. Eng. Methodol..

[32]  Olaf Zimmermann,et al.  Service Cutter: A Systematic Approach to Service Decomposition , 2016, ESOCC.

[33]  Yuanfang Cai,et al.  Service Candidate Identification from Monolithic Systems Based on Execution Traces , 2019, IEEE Transactions on Software Engineering.

[34]  Mano Ram Maurya,et al.  Topological and functional comparison of community detection algorithms in biological networks , 2019, BMC Bioinformatics.