A Workflow Language for Web Automation

Most today's web sources do not provide suitable interfaces for software programs to interact with them. Many researchers have proposed highly effective techniques to address this problem. Nevertheless, ad-hoc solutions are still frequent in real-world web automation applica- tions. Arguably, one of the reasons for this situation is that most proposals have focused on query wrappers, which transform a web source into a special kind of database in which some queries can be executed using a query form and return resultsets that are composed of structured data records. Although the query wrapper model is often useful, it is not appropriate for applications that make decisions according to the data retrieved or processes that use forms that can be mod- elled as insert/update/delete operations. This article proposes a new language for defining web automation processes that is based on a wide range of real-world web automation tasks that are being used by corporations from different business areas.

[1]  C.D. Kloos,et al.  Standards-based languages for programming Web navigation assistants , 2002, Proceedings 3rd IEEE International Workshop on System-on-Chip for Real-Time Applications.

[2]  Wil M. P. van der Aalst,et al.  Workflow Patterns , 2004, Distributed and Parallel Databases.

[3]  Juliana Freire,et al.  Automating Web navigation with the WebVCR , 2000, Comput. Networks.

[4]  Daisy Zhe Wang,et al.  Declarative Information Extraction in a Probabilistic Database System , 2009 .

[5]  Bing Liu,et al.  Extracting Web Data Using Instance-Based Learning , 2007, World Wide Web.

[6]  Thomas Kistler,et al.  WebL - A Programming Language for the Web , 1998, Comput. Networks.

[7]  Ángel Viña,et al.  Semi-Automatic Wrapper Generation for Commercial Web Sources , 2002, Engineering Information Systems in the Internet Context.

[8]  Serge Abiteboul,et al.  Foundations of Databases , 1994 .

[9]  Hector Garcia-Molina,et al.  Extracting structured data from Web pages , 2003, SIGMOD '03.

[10]  Nicholas Kushmerick,et al.  Wrapper induction: Efficiency and expressiveness , 2000, Artif. Intell..

[11]  Bing Liu,et al.  Structured Data Extraction from the Web Based on Partial Tree Alignment , 2006, IEEE Transactions on Knowledge and Data Engineering.

[12]  Yaron Goland,et al.  Web Services Business Process Execution Language , 2009, Encyclopedia of Database Systems.

[13]  Hector Garcia-Molina,et al.  Template-based wrappers in the TSIMMIS system , 1997, SIGMOD '97.

[14]  Georg Gottlob,et al.  Declarative Information Extraction, Web Crawling, and Recursive Wrapping with Lixto , 2001, LPNMR.

[15]  Arnaud Sahuguet,et al.  Building Light-Weight Wrappers for Legacy Web Data-Sources Using W4F , 1999, VLDB.

[16]  Valter Crescenzi,et al.  Automatic information extraction from large websites , 2004, JACM.

[17]  Gio Wiederhold,et al.  Mediators in the architecture of future information systems , 1992, Computer.

[18]  Oren Etzioni,et al.  A scalable comparison-shopping agent for the World-Wide Web , 1997, AGENTS '97.

[19]  Khaled Shaalan,et al.  A Survey of Web Information Extraction Systems , 2006, IEEE Transactions on Knowledge and Data Engineering.

[20]  Fidel Cacheda,et al.  Extracting lists of data records from semi-structured web pages , 2008, Data Knowl. Eng..