Predicting Web Development Effort Using a Bayesian Network

OBJECTIVE - The objective of this paper is to investigate the use of a Bayesian Network (BN) for Web effort estimation. METHOD - We built a BN automatically using the HUGIN tool and data on 120 Web projects from the Tukutuku database. In addition the BN model and node probability tables were also validated by a Web project manager from a well-established Web company in Rio de Janeiro (Brazil). The accuracy was measured using data on 30 projects (validation set), and point estimates (1-fold cross-validation using a 80%-20% split). The estimates obtained using the BN were also compared to estimates obtained using forward stepwise regression (SWR) as this is one of the most frequently used techniques for software and Web effort estimation. RESULTS - Our results showed that BN-based predictions were better than previous predictions from Web-based cross-company models, and significantly better than predictions using SWR. CONCLUSIONS - Our results suggest that, at least for the dataset used, the use of a model that allows the representation of uncertainty, inherent in effort estimation, can outperform other commonly used models, such as those built using multivariate regression techniques.

[1]  Emilia Mendes,et al.  A replicated assessment of the use of adaptation rules to improve Web cost estimation , 2003, 2003 International Symposium on Empirical Software Engineering, 2003. ISESE 2003. Proceedings..

[2]  King-Sun Fu,et al.  IEEE Transactions on Pattern Analysis and Machine Intelligence Publication Information , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Stephen G. MacDonell,et al.  What accuracy statistics really measure , 2001, IEE Proc. Softw..

[4]  L. C. van der Gaag,et al.  Building probabilistic networks: Where do the numbers come from? - a guide to the literature , 2000 .

[5]  Emilia Mendes,et al.  Web Effort Estimation , 2006, Web Engineering.

[6]  Emilia Mendes,et al.  A Comparative Study of Cost Estimation Models for Web Hypermedia Applications , 2003, Empirical Software Engineering.

[7]  Frank Bomarius,et al.  COBRA: a hybrid method for software cost estimation, benchmarking, and risk assessment , 1998, Proceedings of the 20th International Conference on Software Engineering.

[8]  Norman E. Fenton,et al.  Software Measurement: Uncertainty and Causal Modeling , 2002, IEEE Softw..

[9]  Barbara Kitchenham,et al.  A comparison of cross-company and within-company effort estimation models for Web applications , 2004, ICSE 2004.

[10]  Arno J. Knobbe,et al.  Numbers in Multi-relational Data Mining , 2005, PKDD.

[11]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems , 1988 .

[12]  Mark John Taylor,et al.  Methodologies and website development: a survey of practice , 2002, Inf. Softw. Technol..

[13]  Roberto Paiano,et al.  MMWA: a software sizing model for Web applications , 2003, Proceedings of the Fourth International Conference on Web Information Systems Engineering, 2003. WISE 2003..

[14]  Emilia Mendes,et al.  Web Metrics-Estimating Design and Authoring Effort , 2001, IEEE Multim..

[15]  Andrew K. C. Wong,et al.  Synthesizing Statistical Knowledge from Incomplete Mixed-Mode Data , 1987, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Emilia Mendes,et al.  Further investigation into the use of CBR and stepwise regression to predict development effort for Web hypermedia applications , 2002, Proceedings International Symposium on Empirical Software Engineering.

[17]  Emilia Mendes,et al.  Investigating Web size metrics for early Web cost estimation , 2005, J. Syst. Softw..

[18]  Donald J. Reifer,et al.  Web Development: Estimating Quick-to-Market Software , 2000, IEEE Softw..

[19]  Steve Hansen,et al.  Web Engineering: Creating a Discipline among Disciplines , 2001, IEEE Multim..

[20]  Emilia Mendes,et al.  A comparison of case-based reasoning approaches , 2002, WWW '02.

[21]  Emilia Mendes,et al.  Comparison of Web size measures for predicting Web design and authoring effort , 2002, IEE Proc. Softw..

[22]  C. D. Kemp,et al.  Density Estimation for Statistics and Data Analysis , 1987 .

[23]  S. Lauritzen The EM algorithm for graphical association models with missing data , 1995 .

[24]  D. Ross Jeffery,et al.  Cost estimation for web applications , 2003, 25th International Conference on Software Engineering, 2003. Proceedings..

[25]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[26]  Luciano Baresi,et al.  An empirical study on the design effort of Web applications , 2002, Proceedings of the Third International Conference on Web Information Systems Engineering, 2002. WISE 2002..

[27]  Kevin B. Korb,et al.  Bayesian Artificial Intelligence , 2004, Computer science and data analysis series.

[28]  Emilia Mendes,et al.  Web development effort estimation using analogy , 2000, Proceedings 2000 Australian Software Engineering Conference.

[29]  H. E. Dunsmore,et al.  Software engineering metrics and models , 1986 .

[30]  Daniel Kahneman,et al.  Probabilistic reasoning , 1993 .

[31]  Dieter Fensel,et al.  Knowledge Engineering: Principles and Methods , 1998, Data Knowl. Eng..

[32]  Kevin B. Korb,et al.  Parameterising Bayesian Networks , 2004, Australian Conference on Artificial Intelligence.

[33]  Parag C. Pendharkar,et al.  A probabilistic model for predicting software development effort , 2003, IEEE Transactions on Software Engineering.

[34]  Martin Neil,et al.  Building large-scale Bayesian networks , 2000, The Knowledge Engineering Review.

[35]  Ioannis Stamelos,et al.  On the use of Bayesian belief networks for the prediction of software productivity , 2003, Inf. Softw. Technol..

[36]  Emilia Mendes,et al.  Early Web size measures and effort prediction for Web costimation , 2003, Proceedings. 5th International Workshop on Enterprise Networking and Computing in Healthcare Industry (IEEE Cat. No.03EX717).

[37]  Kathryn B. Laskey,et al.  Network Engineering for Complex Belief Networks , 1996, UAI.

[38]  Luciano Baresi,et al.  Estimating the design effort of Web applications , 2003, Proceedings. 5th International Workshop on Enterprise Networking and Computing in Healthcare Industry (IEEE Cat. No.03EX717).

[39]  Geoff W. Hamilton,et al.  Hypertext: The Next Maintenance Mountain , 1998, Computer.

[40]  Genny Tortora,et al.  Effort estimation modeling techniques: a case study for web applications , 2006, ICWE '06.

[41]  William Marsh,et al.  Making resource decisions for software projects , 2004, Proceedings. 26th International Conference on Software Engineering.

[42]  Emilia Mendes,et al.  Measurement, prediction and risk analysis for Web applications , 2001, Proceedings Seventh International Software Metrics Symposium.

[43]  Emilia Mendes,et al.  Further comparison of cross-company and within-company effort estimation models for Web applications , 2004 .

[44]  Donald J. Reifer Ten Deadly Risks in Internet and Intranet Software Development , 2002, IEEE Softw..

[45]  Marek J. Druzdzel,et al.  Knowledge Engineering for Very Large Decision-analytic Medical Models , 1999, AMIA.

[46]  Emilia Mendes,et al.  Web Metrics— Estimating and Authoring Effort , 2001 .

[47]  Finn Verner Jensen,et al.  Introduction to Bayesian Networks , 2008, Innovations in Bayesian Networks.

[48]  Norman E. Fenton,et al.  Modeling dependable systems using hybrid Bayesian networks , 2006, First International Conference on Availability, Reliability and Security (ARES'06).