Auto-locating and fix-propagating for HTML validation errors to PHP server-side code

Checking/correcting HTML validation errors in Web pages is helpful for Web developers in finding/fixing bugs. However, existing validating/fixing tools work well only on static HTML pages and do not help fix the corresponding server code if validation errors are found in HTML pages, due to several challenges with dynamically generated pages in Web development. We propose PhpSync, a novel automatic locating/fixing tool for HTML validation errors in PHP-based Web applications. Given an HTML page produced by a server-side PHP program, PhpSync uses Tidy, an HTML validating/correcting tool to find the validation errors in that HTML page. If errors are detected, it leverages the fixes from Tidy in the given HTML page and propagates them to the corresponding location(s) in PHP code. Our core solutions include 1) a symbolic execution algorithm on the given PHP program to produce a single tree-based model, called D-model, which approximately represents its possible client page outputs, 2) an algorithm mapping any text in the given HTML page to the text(s) in the node(s) of the D-model and then to the PHP code, and 3) a fix-propagating algorithm from the fixes in the HTML page to the PHP code via the D-model and the mapping algorithm. Our empirical evaluation shows that on average, PhpSync achieves 96.7% accuracy in locating the corresponding locations in PHP code from client pages, and 95% accuracy in propagating the fixes to the server-side code.

[1]  Yasuhiko Minamide,et al.  Static approximation of dynamically generated Web pages , 2005, WWW '05.

[2]  Frank Tip,et al.  Practical fault localization for dynamic web applications , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[3]  Frank Tip,et al.  Finding Bugs in Web Applications Using Dynamic Test Generation and Explicit-State Model Checking , 2010, IEEE Transactions on Software Engineering.

[4]  Frank Tip,et al.  Finding bugs in dynamic web applications , 2008, ISSTA '08.

[5]  Tao Xie,et al.  Locating need-to-translate constant strings in web applications , 2010, FSE '10.

[6]  Alexander Aiken,et al.  Static Detection of Security Vulnerabilities in Scripting Languages , 2006, USENIX Security Symposium.

[7]  Michael D. Ernst,et al.  Automatic creation of SQL Injection and cross-site scripting attacks , 2009, 2009 IEEE 31st International Conference on Software Engineering.

[8]  Mirina Grosz,et al.  World Wide Web Consortium , 2010 .

[9]  Peter Thiemann Grammar-based analysis of string expressions , 2005, TLDI '05.

[10]  Alfred V. Aho,et al.  Compilers: Principles, Techniques, and Tools , 1986, Addison-Wesley series in computer science / World student series edition.

[11]  Akinori Yonezawa,et al.  Regular Expression Types for Strings in a Text Processing Language , 2002, Electron. Notes Theor. Comput. Sci..

[12]  C. Michael Sperberg-McQueen,et al.  World Wide Web Consortium , 2009, Encyclopedia of Database Systems.

[13]  Zhendong Su,et al.  Static detection of cross-site scripting vulnerabilities , 2008, 2008 ACM/IEEE 30th International Conference on Software Engineering.

[14]  Mary Jean Harrold,et al.  Empirical evaluation of the tarantula automatic fault-localization technique , 2005, ASE.

[15]  Premkumar T. Devanbu,et al.  Static checking of dynamically generated queries in database applications , 2004, Proceedings. 26th International Conference on Software Engineering.