User-centred tooling for modelling of big data applications

We outline the key requirements for a Big Data modelling recommender tool. Our web-based tool is suitable for capturing system requirements in big data analytics applications involving diverse stakeholders. It promotes awareness of the datasets and algorithm implementations that are available to leverage in the design of the solution. We implement these ideas in BiDaML-web, a proof of concept recommender system for Big Data applications, and evaluate the tool using an empirical study with a group of 16 target end-users. Participants found the integrated recommender and technique suggestion tools helpful and highly rated the overall BiDaML web-based modelling experience. BiDaML-web is available at https://bidaml.web.app/ and the source code can be accessed at https://github.com/tarunverma23/bidaml.

[1]  Taghi M. Khoshgoftaar,et al.  A survey of open source tools for machine learning with big data in the Hadoop ecosystem , 2015, Journal of Big Data.

[2]  Kim Marriott,et al.  Dunnart: A Constraint-Based Network Diagram Authoring Tool , 2009, GD.

[3]  Miryung Kim,et al.  Data Scientists in Software Teams: State of the Art and Challenges , 2018, IEEE Transactions on Software Engineering.

[4]  John Grundy,et al.  BiDaML: A Suite of Visual Languages for Supporting End-User Data Analytics , 2019, 2019 IEEE International Congress on Big Data (BigDataCongress).

[5]  Mohamed Abdelrazek,et al.  Survey and Analysis of Current End-User Data Analytics Tool Support , 2019, IEEE Transactions on Big Data.

[6]  John Hosking,et al.  An end-to-end model-based approach to support big data analytics development , 2020, J. Comput. Lang..

[7]  Paulo S. C. Alencar,et al.  A Preliminary Survey on Domain-Specific Languages for Machine Learning in Big Data , 2016, 2016 IEEE International Conference on Software Science, Technology and Engineering (SWSTE).

[8]  Ann Blandford,et al.  “Making my own luck”: Serendipity strategies and how to support them in digital information environments , 2014, J. Assoc. Inf. Sci. Technol..

[9]  Amy X. Zhang,et al.  How do Data Science Workers Collaborate? Roles, Workflows, and Tools , 2020, Proc. ACM Hum. Comput. Interact..

[10]  D. Sculley,et al.  Hidden Technical Debt in Machine Learning Systems , 2015, NIPS.

[11]  Qiang He,et al.  A Survey of Current End-User Data Analytics Tool Support , 2018, 2018 IEEE International Congress on Big Data (BigData Congress).

[12]  Juha-Pekka Tolvanen,et al.  MetaEdit+: defining and using domain-specific modeling languages and code generators , 2003, OOPSLA '03.

[13]  Lars Grunske,et al.  Dimensions and Metrics for Evaluating Recommendation Systems , 2014, Recommendation Systems in Software Engineering.

[14]  Daniel L. Moody,et al.  The “Physics” of Notations: Toward a Scientific Basis for Constructing Visual Notations in Software Engineering , 2009, IEEE Transactions on Software Engineering.