Predicting Software Outcomes Using Data Mining and Text Mining

Organizations spend a major portion of their Information Technology budget on software maintenance. In this paper, we present a predictive model for the maintenance outcomes of the software projects. We also identify the factors that affect software maintenance outcomes. We build our model using Data Mining (DM) techniques on Open Source Software (OSS) project data. We use the public access to the data archives of over 100,000 projects hosted by SourceForge.net (SF). We use prior research in software engineering to identify the initial set of variables used in the model building process. We use multiple DM techniques available in SAS® Enterprise Miner™. We also create additional new variables from the textual data provided by SF, through SAS® Text Miner. The use of these new variables improves the model, significantly. The final model is selected based on domain knowledge and fit statistics of the models. Results indicate that end-user participation, product functionality, and usefulness of the project affect the software maintenance quality.