SERGEANT: A framework for building more flexible web agents by exploiting a search engine

With the rapid growth of the World Wide Web, there is growing interest in developing web agents that interact with online services to acquire information. However, finding the online services perfectly suited for a given task is not always feasible. First, the agents might not be given sufficient information to fill in the required input fields for querying an online service. Second, the online service might generate only partial information. Third, the agents might need to know the information about B by some input set A, but they can only find the online services that generate A from B. Fourth, most of the online services do not tolerate errors in the inputs, thus even a minor typo in the input field can hinder them from generating any meaningful results. This paper proposes SERGEANT, a framework for building flexible web agents that handle these imperfect situations. In this framework we exploit an information retrieval (IR) system as a general discovery tool to assist finding and pruning information. To demonstrate SERGEANT, we implemented two web agents: the Internet inverse geocoder and the address lookup module. Our experiments show that these agents are capable of generating high-quality results under imperfect situations.

[1]  Daniel S. Weld,et al.  Planning to gather inforrnation , 1996, AAAI 1996.

[2]  Yorick Wilks,et al.  Information Extraction as a Core Language Technology , 1997, SCIE.

[3]  S WeldDaniel,et al.  Scaling question answering to the web , 2001 .

[4]  Craig A. Knoblock,et al.  Wrapper Maintenance: A Machine Learning Approach , 2011, J. Artif. Intell. Res..

[5]  Nicholas Kushmerick,et al.  Wrapper Induction for Information Extraction , 1997, IJCAI.

[6]  Roberto Basili,et al.  Integrating ontological and linguistic knowledge for conceptual information extraction , 2003, Proceedings IEEE/WIC International Conference on Web Intelligence (WI 2003).

[7]  Alfred Menezes,et al.  Handbook of Applied Cryptography , 2018 .

[8]  Oren Etzioni,et al.  Multi-Engine Search and Comparison Using the MetaCrawler , 1995, World Wide Web J..

[9]  Yasuhiko Kitamura,et al.  Keyword Spices: A New Method for Building Domain-Specific Web Search Engines , 2001, IJCAI.

[10]  Ralph Kimball,et al.  Dealing with dirty data , 1996 .

[11]  Maria Teresa Pazienza,et al.  Information Extraction A Multidisciplinary Approach to an Emerging Information Technology , 1997, Lecture Notes in Computer Science.

[12]  Richard M. Schwartz,et al.  Nymble: a High-Performance Learning Name-finder , 1997, ANLP.

[13]  Dayne Freitag,et al.  Information Extraction from HTML: Application of a General Machine Learning Approach , 1998, AAAI/IAAI.

[14]  Wei Li,et al.  Information Extraction Supported Question Answering , 1999, TREC.

[15]  Craig A. Knoblock,et al.  The Ariadne Approach to Web-Based Information Integration , 2001, Int. J. Cooperative Inf. Syst..

[16]  Ralph Grishman,et al.  NYU: Description of the MENE Named Entity System as Used in MUC-7 , 1998, MUC.

[17]  George R. Krupka,et al.  IsoQuest Inc.: Description of the NetOwl™ Extractor System as Used for MUC-7 , 1998, MUC.

[18]  Daniel S. Weld,et al.  Planning to Gather Information , 1996, AAAI/IAAI, Vol. 1.

[19]  Craig A. Knoblock,et al.  Hierarchical Wrapper Induction for Semistructured Information Sources , 2004, Autonomous Agents and Multi-Agent Systems.

[20]  Gershon Elber,et al.  WebSuite: A Tool Suite for Harnessing Web Data , 1998, WebDB.

[21]  Stephen Soderland,et al.  Learning Information Extraction Rules for Semi-Structured and Free Text , 1999, Machine Learning.

[22]  Yasuhiko Kitamura,et al.  Web Information Integration Using Multiple Character Agents , 2004, Life-like characters.

[23]  Oren Etzioni,et al.  Multi-Service Search and Comparison Using the MetaCrawler , 1995 .

[24]  Oren Etzioni,et al.  Scaling question answering to the Web , 2001, WWW '01.

[25]  C. Lee Giles,et al.  CiteSeer: an autonomous Web agent for automatic retrieval and identification of interesting publications , 1998, AGENTS '98.

[26]  Joann J. Ordille,et al.  Querying Heterogeneous Information Sources Using Source Descriptions , 1996, VLDB.

[27]  Karen Kukich,et al.  Techniques for automatically correcting words in text , 1992, CSUR.