Towards a benchmark for Web site extractors: a call for community participation

The purpose of this paper is to propose a benchmark for comparing fact extractors for Web sites and to invite interested researchers and practitioners to participate in its development. Fact extraction is a fundamental and difficult problem in both traditional software reverse engineering and Web site reverse engineering. In both domains, there are often irregularities in the input that violate an extractor's unstated assumptions. Consequently, it is difficult to predict how an extractor will perform in a given input. To remedy this problem, we created a benchmark for comparing fact extractors for the C++ programming language. We found that this benchmark improved our understanding of fact extraction, the tools produced, and the maturity of the community. The same approach, we believe, will be beneficial for Web site extractors and we propose WebETS (Web site Extractor Test Suite.) In this paper we give some starting points for the design of WebETS and ask others to join in the effort.

[1]  Cornelia Boldyreff,et al.  The case for the use of plain English to increase web accessibility , 2001, Proceedings 3rd International Workshop on Web Site Evolution. WSE 2001.

[2]  Arie van Deursen,et al.  The Reengineering Wiki , 2002, Proceedings of the Sixth European Conference on Software Maintenance and Reengineering.

[3]  Walter F. Tichy,et al.  Should Computer Scientists Experiment More? , 1998, Computer.

[4]  Jianguo Lu,et al.  Migrating E-commerce database applications to an enterprise Java environment , 2001, CASCON.

[5]  Panagiotis K. Linos,et al.  Maintenance support for web sites: a case study , 2001, Proceedings 3rd International Workshop on Web Site Evolution. WSE 2001.

[6]  Mary Shaw,et al.  The coming-of-age of software architecture research , 2001, Proceedings of the 23rd International Conference on Software Engineering. ICSE 2001.

[7]  Paul Lukowicz,et al.  Experimental evaluation in computer science: A quantitative study , 1995, J. Syst. Softw..

[8]  Michele Kirchner Evaluation, repair, and transformation of Web pages for Web content accessibility. Review of some available tools , 2002, Proceedings. Fourth International Workshop on Web Site Evolution.

[9]  Richard C. Holt,et al.  Architecture recovery of web applications , 2002, ICSE '02.

[10]  Giuliano Antoniol,et al.  An approach for reverse engineering of web-based applications , 2001, Proceedings Eighth Working Conference on Reverse Engineering.

[11]  Susan Elliott Sim,et al.  On using a benchmark to evaluate C++ extractors , 2002, Proceedings 10th International Workshop on Program Comprehension.

[12]  Johannes Martin,et al.  Web site maintenance with software-engineering tools , 2001, Proceedings 3rd International Workshop on Web Site Evolution. WSE 2001.

[13]  Paolo Tonella,et al.  Building a Tool for the Analysis and Testing of Web Applications: Problems and Solutions , 2001, TACAS.

[14]  Paolo Tonella,et al.  Understanding and Restructuring Web Sites with ReWeb , 2001, IEEE Multim..

[15]  Kenny Wong,et al.  A collaborative demonstration of reverse engineering tools , 2002, SIAP.

[16]  M. Foucault,et al.  Ceci N'est Pas Une Pipe , 1976 .

[17]  Leon Moonen,et al.  Generating robust parsers using island grammars , 2001, Proceedings Eighth Working Conference on Reverse Engineering.

[18]  Giuseppe A. Di Lucca,et al.  WARE: a tool for the reverse engineering of Web applications , 2002, Proceedings of the Sixth European Conference on Software Maintenance and Reengineering.

[19]  Craig Gaskell,et al.  "Ceci West pas une pipe" Observations on the Nature of Webware , 2002, Proceedings. Fourth International Workshop on Web Site Evolution.

[20]  Cornelia Boldyreff,et al.  The evolution of Websites , 1999, Proceedings Seventh International Workshop on Program Comprehension.

[21]  Susan Elliott Sim,et al.  Using benchmarking to advance research: a challenge to software engineering , 2003, 25th International Conference on Software Engineering, 2003. Proceedings..

[22]  R. S. Pressman,et al.  What a tangled Web we weave [Web engineering] , 2000 .