Dominoes: An Interactive Exploratory Data Analysis tool for Software Relationships

Project comprehension questions, such as “which modified artifacts can affect my work?” and “how can I identify the developers who should be assigned to a given task?” are difficult to answer, require an analysis of the project and its data, are context specific, and cannot always be pre-defined. Current research approaches are restricted to post hoc analyses over software repositories. Very few interactive exploratory tools exist because the large amount of data that need to be analyzed prohibits its exploration at interactive rates. Moreover, such analyses typically require the user to create complex scripts or queries to extract the desired information from data. Here we present Dominoes, a tool for interactive data exploration aimed at end users (i.e., project managers or developers). Dominoes allows users to interact with different types and units of data to investigate project relationships and view intermediate results as charts, tables, and graphs. Additionally, it allows users to save the derived data as well as their exploration paths for later use. In a scenario-based evaluation study, participants achieved a success rate of 86% in their explorations, with a mean time of 7.25 minutes for answering a set of (project) exploration questions.

[1]  James D. Herbsleb,et al.  Identification of coordination requirements: implications for the Design of collaboration and awareness tools , 2006, CSCW '06.

[2]  Robert DeLine,et al.  Information Needs in Collocated Software Development Teams , 2007, 29th International Conference on Software Engineering (ICSE'07).

[3]  Atul Prakash,et al.  A Query Algebra for Program Databases , 1996, IEEE Trans. Software Eng..

[4]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[5]  Miryung Kim,et al.  Data Scientists in Software Teams: State of the Art and Challenges , 2018, IEEE Transactions on Software Engineering.

[6]  Esteban Walter Gonzalez Clua,et al.  Niche vs. breadth: Calculating expertise over time through a fine-grained analysis , 2015, 2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER).

[7]  Gail C. Murphy,et al.  Recommending Emergent Teams , 2007, Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007).

[8]  Andrew Begel,et al.  Deep intellisense: a tool for rehydrating evaporated information , 2008, MSR '08.

[9]  Prasun Dewan,et al.  Semi-Synchronous Conflict Detection and Resolution in Asynchronous Software Development , 2007, ECSCW.

[10]  Jörg M. Haake,et al.  Supporting distributed software development by modes of collaboration , 2001, ECSCW.

[11]  B. Myers Debugging Reinvented: Asking and Answering Why and Why Not Questions about Program Behavior , 2008 .

[12]  Andrew Begel,et al.  Codebook: discovering and exploiting relationships in software repositories , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[13]  Esteban Walter Gonzalez Clua,et al.  Exploratory Data Analysis of Software Repositories via GPU Processing , 2014, SEKE.

[14]  Lucian Voinea,et al.  Visual querying and analysis of large software repositories , 2008, Empirical Software Engineering.

[15]  Andrew Begel,et al.  WhoselsThat: finding software engineers with codebook , 2010, FSE '10.

[16]  André van der Hoek,et al.  Palantir: coordinating distributed workspaces , 2002, Proceedings 26th Annual International Computer Software and Applications.

[17]  Chris Stolte,et al.  Dynamic workload driven data integration in tableau , 2012, SIGMOD Conference.

[18]  Shane McIntosh,et al.  Mining Co-change Information to Understand When Build Changes Are Necessary , 2014, 2014 IEEE International Conference on Software Maintenance and Evolution.

[19]  Gail C. Murphy,et al.  Questions programmers ask during software evolution tasks , 2006, SIGSOFT '06/FSE-14.

[20]  Brad A. Myers,et al.  Six Learning Barriers in End-User Programming Systems , 2004, 2004 IEEE Symposium on Visual Languages - Human Centric Computing.

[21]  Anita Sarma,et al.  Tesseract: Interactive visual exploration of socio-technical relationships in software development , 2009, 2009 IEEE 31st International Conference on Software Engineering.

[22]  Abraham Bernstein,et al.  Mining Software Repositories with iSPAROL and a Software Evolution Ontology , 2007, Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007).

[23]  T.R.G. Green,et al.  Programming Languages as Information Structures , 1990 .

[24]  A.E. Hassan,et al.  The road ahead for Mining Software Repositories , 2008, 2008 Frontiers of Software Maintenance.

[25]  Esteban Walter Gonzalez Clua,et al.  Multi-Perspective Exploratory Analysis of Software Development Data , 2015, Int. J. Softw. Eng. Knowl. Eng..

[26]  Thomas Fritz,et al.  Using information fragments to answer the questions developers ask , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[27]  Gunter Saake,et al.  GPU-Accelerated Database Systems: Survey and Open Challenges , 2014, Trans. Large Scale Data Knowl. Centered Syst..

[28]  M. E. Conway HOW DO COMMITTEES INVENT , 1967 .

[29]  Gregg Rothermel,et al.  On the benefits of providing versioning support for end users: An empirical study , 2014, TCHI.

[30]  Margaret M. Burnett,et al.  Foraging Among an Overabundance of Similar Variants , 2016, CHI.

[31]  Feng Liu,et al.  A survey of the practice of computational science , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[32]  James D. Herbsleb,et al.  Splitting the organization and integrating the code: Conway's law revisited , 1999, Proceedings of the 1999 International Conference on Software Engineering (IEEE Cat. No.99CB37002).

[33]  Stéphane Ducasse,et al.  How developers drive software evolution , 2005, Eighth International Workshop on Principles of Software Evolution (IWPSE'05).

[34]  D. V. Steward,et al.  The design structure system: A method for managing the design of complex systems , 1981, IEEE Transactions on Engineering Management.

[35]  Thomas Zimmermann,et al.  Automatic Identification of Bug-Introducing Changes , 2006, 21st IEEE/ACM International Conference on Automated Software Engineering (ASE'06).

[36]  Shaochun Xu,et al.  Dialog-based protocol: an empirical research method for cognitive activities in software engineering , 2005, 2005 International Symposium on Empirical Software Engineering, 2005..

[37]  James D. Herbsleb,et al.  Socio-technical congruence: a framework for assessing the impact of technical and work dependencies on software development productivity , 2008, ESEM '08.

[38]  Marianne Shaw,et al.  On Improving User Response Times in Tableau , 2015, SIGMOD Conference.

[39]  Masahide Nakamura,et al.  Visualizing Software Metrics with Service-Oriented Mining Software Repository for Reviewing Personal Process , 2013, 2013 14th ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing.

[40]  Keng Siau,et al.  The effect of data model, system and task characteristics on user query performance: an empirical study , 1997, DATB.