Reverse engineering of web applications

The heterogeneous and dynamic nature of components making up a Web application, the lack of effective programming mechanisms for implementing basic software engineering principles in it, and undisciplined development processes induced by the high pressure of a very short time-to-market, make Web application maintenance a challenging problem. A relevant issue consists of reusing the methodological and technological experience in the sector of traditional software maintenance, and exploring the opportunity of using reverse engineering to support effective Web application maintenance. This article presents an approach for reverse engineering Web applications. The approach include the definition of reverse engineering methods and supporting software tools, that help to understand existing undocumented Web applications to be maintained or evolved, through the reconstruction of UML diagrams. Some validation experiments have been carried out and they showed the usefulness of the proposed approach and highlighted possible areas for improvement of its effectiveness.

[1]  Hugo T. Jankowitz Detecting Plagiarism in Student Pascal Programs , 1988, Comput. J..

[2]  Robert W. Schwanke,et al.  An intelligent tool for re-engineering software modularity , 1991, [1991 Proceedings] 13th International Conference on Software Engineering.

[3]  Susan Horwitz,et al.  Identifying the semantic and textual differences between two versions of a program , 1990, PLDI '90.

[4]  Aniello Cimitile,et al.  A reverse engineering methodology to reconstruct hierarchical data flow diagrams for software maintenance , 1989, Proceedings. Conference on Software Maintenance - 1989.

[5]  Jenifer Tidwell INTERACTION DESIGN PATTERNS: P29 , 1998 .

[6]  Giuseppe A. Di Lucca,et al.  Reverse engineering Web applications: the WARE approach , 2004, J. Softw. Maintenance Res. Pract..

[7]  Carl Bedingfield A pattern language for web usability , 2003, UBIQ.

[8]  Emden R. Gansner,et al.  Bunch: a clustering tool for the recovery and maintenance of software system structures , 1999, Proceedings IEEE International Conference on Software Maintenance - 1999 (ICSM'99). 'Software Maintenance for Business Change' (Cat. No.99CB36360).

[9]  Giuseppe A. Di Lucca,et al.  Supporting concept assignment in the comprehension of Web applications , 2004, Proceedings of the 28th Annual International Computer Software and Applications Conference, 2004. COMPSAC 2004..

[10]  Renato De Mori,et al.  Pattern matching for design concept localization , 1995, Proceedings of 2nd Working Conference on Reverse Engineering.

[11]  Gordon Kotik,et al.  Reengineering procedural into object-oriented systems , 1995, Proceedings of 2nd Working Conference on Reverse Engineering.

[12]  Arnold Kamis,et al.  Extending the capabilities of RMM: Russian dolls and hypertext , 1997, Proceedings of the Thirtieth Hawaii International Conference on System Sciences.

[13]  Michael Benedikt,et al.  VeriWeb: Automatically Testing Dynamic Web Sites , 2002 .

[14]  Jean Vanderdonckt,et al.  Flexible reverse engineering of web pages with VAQUISTA , 2001, Proceedings Eighth Working Conference on Reverse Engineering.

[15]  Shihong Huang,et al.  Evaluating the reverse engineering capabilities of Web tools for understanding site content and structure: a case study , 2001, Proceedings of the 23rd International Conference on Software Engineering. ICSE 2001.

[16]  Brenda S. Baker,et al.  On finding duplication and near-duplication in large software systems , 1995, Proceedings of 2nd Working Conference on Reverse Engineering.

[17]  P. Oman,et al.  Metrics for assessing a software system's maintainability , 1992, Proceedings Conference on Software Maintenance 1992.

[18]  Giuseppe A. Di Lucca,et al.  Testing Web applications , 2002, International Conference on Software Maintenance, 2002. Proceedings..

[19]  Richard C. Holt,et al.  ACCD: an algorithm for comprehension-driven clustering , 2000, Proceedings Seventh Working Conference on Reverse Engineering.

[20]  Paolo Tonella,et al.  Analysis and testing of Web applications , 2001, Proceedings of the 23rd International Conference on Software Engineering. ICSE 2001.

[21]  Michele Kirchner Evaluation, repair, and transformation of Web pages for Web content accessibility. Review of some available tools , 2002, Proceedings. Fourth International Workshop on Web Site Evolution.

[22]  Johannes Martin,et al.  Web site maintenance with software-engineering tools , 2001, Proceedings 3rd International Workshop on Web Site Evolution. WSE 2001.

[23]  N. Wilde,et al.  Identifying objects in a conventional procedural language: an example of data design recovery , 1990, Proceedings. Conference on Software Maintenance 1990.

[24]  Ruben Leon,et al.  A word stemming algorithm for the Spanish language , 2000, Proceedings Seventh International Symposium on String Processing and Information Retrieval. SPIRE 2000.

[25]  Alec Main Application Security: Building in Security during the Development Stage , 2004, Inf. Secur. J. A Glob. Perspect..

[26]  Jim Conallen,et al.  Building Web applications with UML , 1999 .

[27]  Giuseppe A. Di Lucca,et al.  Abstracting business level UML diagrams from Web applications , 2003, Fifth IEEE International Workshop on Web Site Evolution, 2003. Theme: Architecture. Proceedings..

[28]  Gerardo Canfora,et al.  An improved algorithm for identifying objects in code , 1996 .

[29]  Aniello Cimitile,et al.  Reverse engineering processes, design document production, and structure charts , 1992, J. Syst. Softw..

[30]  Richard Sharp,et al.  Developing Secure Web Applications , 2002, IEEE Internet Comput..

[31]  C. Huyck,et al.  A stemming algorithm for the portuguese language , 2001, Proceedings Eighth Symposium on String Processing and Information Retrieval.

[32]  Nicolas Edwin Gold,et al.  Hypothesis-based concept assignment to support software maintenance , 2001, Proceedings IEEE International Conference on Software Maintenance. ICSM 2001.

[33]  Kostas Kontogiannis,et al.  Evaluation experiments on the detection of programming patterns using software metrics , 1997, Proceedings of the Fourth Working Conference on Reverse Engineering.

[34]  Hausi A. Müller,et al.  Programmable Reverse Engineering , 1994, Int. J. Softw. Eng. Knowl. Eng..

[35]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[36]  Ettore Merlo,et al.  Assessing the benefits of incorporating function clone detection in a development process , 1997, 1997 Proceedings International Conference on Software Maintenance.

[37]  Giuseppe A. Di Lucca,et al.  Comprehending Web applications by a clustering based approach , 2002, Proceedings 10th International Workshop on Program Comprehension.

[38]  Hausi A. Müller,et al.  Rigi: a system for programming-in-the-large , 1988, Proceedings. [1989] 11th International Conference on Software Engineering.

[39]  Magdalena Balazinska,et al.  Advanced clone-analysis to support object-oriented system refactoring , 2000, Proceedings Seventh Working Conference on Reverse Engineering.

[40]  Samuel L. Grier,et al.  A tool that detects plagiarism in Pascal programs , 1981, SIGCSE '81.

[41]  Giuseppe A. Di Lucca,et al.  WARE: a tool for the reverse engineering of Web applications , 2002, Proceedings of the Sixth European Conference on Software Maintenance and Reengineering.

[42]  Lionel C. Briand,et al.  Using coupling measurement for impact analysis in object-oriented systems , 1999, Proceedings IEEE International Conference on Software Maintenance - 1999 (ICSM'99). 'Software Maintenance for Business Change' (Cat. No.99CB36360).

[43]  Francoise Balmas,et al.  Displaying dependence graphs: a hierarchical approach , 2001, Proceedings Eighth Working Conference on Reverse Engineering.

[44]  Victor R. Basili,et al.  System Structure Analysis: Clustering with Data Bindings , 1985, IEEE Transactions on Software Engineering.

[45]  Ellis Horowitz,et al.  Software Cost Estimation with COCOMO II , 2000 .

[46]  Richard Sharp,et al.  Abstracting application-level web security , 2002, WWW.

[47]  Giuseppe A. Di Lucca,et al.  Migrating legacy systems towards object-oriented platforms , 1997, 1997 Proceedings International Conference on Software Maintenance.

[48]  Giuseppe A. Di Lucca,et al.  Identifying cross site scripting vulnerabilities in Web applications , 2004, Proceedings. Sixth IEEE International Workshop on Web Site Evolution.

[49]  Jan O. Borchers A pattern approach to interaction design , 2001, DIS '00.

[50]  Magdalena Balazinska,et al.  Measuring clone based reengineering opportunities , 1999, Proceedings Sixth International Software Metrics Symposium (Cat. No.PR00403).

[51]  Giuseppe Visaggio,et al.  Software salvaging and the call dominance tree , 1995, J. Syst. Softw..

[52]  Hal Berghel,et al.  Measurements of program similarity in identical task environments , 1984, SIGP.

[53]  Gustavo Rossi,et al.  Engineering Web Applications for Reuse , 2001, IEEE Multim..

[54]  Stefano Ceri,et al.  Web Modeling Language (WebML): a modeling language for designing Web sites , 2000, Comput. Networks.

[55]  Aniello Cimitile,et al.  Identifying objects in legacy systems using design metrics , 1999, J. Syst. Softw..

[56]  Giuliano Antoniol,et al.  Object oriented design pattern inference , 1999, Proceedings IEEE International Conference on Software Maintenance - 1999 (ICSM'99). 'Software Maintenance for Business Change' (Cat. No.99CB36360).

[57]  Giuseppe A. Di Lucca,et al.  Towards the definition of a maintainability model for Web applications , 2004, Eighth European Conference on Software Maintenance and Reengineering, 2004. CSMR 2004. Proceedings..

[58]  Landall J. Stoltenberg Application Security: Have We Locked the Windows and Left the Door Open? , 2003, Inf. Secur. J. A Glob. Perspect..

[59]  Richard C. Holt,et al.  Towards a better understanding of Web applications , 2001, Proceedings 3rd International Workshop on Web Site Evolution. WSE 2001.

[60]  Paolo Tonella,et al.  Restructuring multilingual web sites , 2002, International Conference on Software Maintenance, 2002. Proceedings..

[61]  Nicolas Anquetil,et al.  Extracting concepts from file names; a new file clustering criterion , 1998, Proceedings of the 20th International Conference on Software Engineering.

[62]  Matthew S. Hecht,et al.  Flow Analysis of Computer Programs , 1977 .

[63]  Jim Conallen,et al.  Modeling Web application architectures with UML , 1999, CACM.

[64]  Piero Fraternali,et al.  A semantic model for specifying data-intensive Web applications using WebML , 2001, SWWS.

[65]  Shih-Kun Huang,et al.  Web application security assessment by fault injection and behavior monitoring , 2003, WWW '03.

[66]  Massimiliano Di Penta,et al.  Clone Analysis in the Web Era: an Approach to Identify Cloned Web Pages , 2001 .

[67]  Michel Dagenais,et al.  Extending software quality assessment techniques to Java systems , 1999, Proceedings Seventh International Workshop on Program Comprehension.

[68]  Paolo Tonella,et al.  Understanding and Restructuring Web Sites with ReWeb , 2001, IEEE Multim..

[69]  Brenda S. Baker Parameterized pattern matching by Boyer-Moore-type algorithms , 1995, SODA '95.

[70]  Rudolf Ferenc,et al.  Mining design patterns from C++ source code , 2003, International Conference on Software Maintenance, 2003. ICSM 2003. Proceedings..

[71]  L. R. Rasmussen,et al.  In information retrieval: data structures and algorithms , 1992 .

[72]  Richard C. Holt,et al.  On the stability of software clustering algorithms , 2000, Proceedings IWPC 2000. 8th International Workshop on Program Comprehension.

[73]  Giuseppe A. Di Lucca,et al.  Identifying reusable components in web applications , 2004, IASTED Conf. on Software Engineering.

[74]  Donald J. Reifer,et al.  Web Development: Estimating Quick-to-Market Software , 2000, IEEE Softw..

[75]  Paolo Tonella,et al.  Evaluation methods for Web application clustering , 2003, Fifth IEEE International Workshop on Web Site Evolution, 2003. Theme: Architecture. Proceedings..

[76]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[77]  Gustavo Rossi,et al.  Web Application Models Are More Than Conceptual Models , 1999, ER.

[78]  Nicolas Anquetil,et al.  Experiments with clustering as a software remodularization method , 1999, Sixth Working Conference on Reverse Engineering (Cat. No.PR00303).

[79]  Renato De Mori,et al.  Pattern matching for clone and concept detection , 2004, Automated Software Engineering.

[80]  Vassilis Prevelakis,et al.  Characterizing the 'security vulnerability likelihood' of software functions , 2003, International Conference on Software Maintenance, 2003. ICSM 2003. Proceedings..

[81]  Giuseppe A. Di Lucca,et al.  Recovering class diagrams from data-intensive legacy systems , 2000, Proceedings 2000 International Conference on Software Maintenance.

[82]  Paolo Tonella,et al.  Using keyword extraction for Web site clustering , 2003, Fifth IEEE International Workshop on Web Site Evolution, 2003. Theme: Architecture. Proceedings..

[83]  Sam Chung,et al.  Reverse software engineering with UML for Web site maintenance , 2000, Proceedings of the First International Conference on Web Information Systems Engineering.

[84]  Cornelia Boldyreff,et al.  Reverse engineering to achieve maintainable WWW sites , 2001, Proceedings Eighth Working Conference on Reverse Engineering.

[85]  Ted J. Biggerstaff,et al.  Program understanding and the concept assignment problem , 1994, CACM.

[86]  Joseph George,et al.  A strategy for mapping from function-oriented software models to object-oriented software models , 1996, SOEN.

[87]  Giuseppe A. Di Lucca,et al.  Recovering interaction design patterns in Web applications , 2005, Ninth European Conference on Software Maintenance and Reengineering.

[88]  Theodore Johnson,et al.  A new approach to finding objects in programs , 1994, J. Softw. Maintenance Res. Pract..

[89]  Ettore Merlo,et al.  Experiment on the automatic detection of function clones in a software system using metrics , 1996, 1996 Proceedings of International Conference on Software Maintenance.

[90]  D. R. Harris,et al.  Recovering abstract data types and object instances from a conventional procedural language , 1995, Proceedings of 2nd Working Conference on Reverse Engineering.

[91]  T. A. Wiggerts,et al.  Using clustering algorithms in legacy systems remodularization , 1997, Proceedings of the Fourth Working Conference on Reverse Engineering.

[92]  Gerrit C. van der Veer,et al.  Pattern Languages in Interaction Design: Structure and Organization , 2003 .

[93]  Fabio Paternò,et al.  Automatic reconstruction of the underlying interaction design of web applications , 2002, SEKE '02.

[94]  Kazuhito Ohmaki Open source software research activities in AIST towards secure open systems , 2002, 7th IEEE International Symposium on High Assurance Systems Engineering, 2002. Proceedings..

[95]  Paolo Tonella,et al.  Web site analysis: structure and evolution , 2000, Proceedings 2000 International Conference on Software Maintenance.

[96]  David Endler,et al.  The Evolution of Cross Site Scripting Attacks , 2002 .

[97]  Paolo Tonella,et al.  An empirical study on keyword-based Web site clustering , 2004, Proceedings. 12th IEEE International Workshop on Program Comprehension, 2004..

[98]  Emden R. Gansner,et al.  Using automatic clustering to produce high-level system organizations of source code , 1998, Proceedings. 6th International Workshop on Program Comprehension. IWPC'98 (Cat. No.98TB100242).