Extracting and Customizing Information Using Multi-Agents

Rapidly evolving network and computer technology, coupled with the exponential growth of the services and information available on the Internet, has already brought us to the point where hundreds of millions of people should have fast, pervasive access to a phenomenal amount of information, through desktop machines at work, school and home, through televisions, phones, pagers, and car dashboards, from anywhere and everywhere. The challenge of complex environments is therefore obvious: software is expected to do more in more situations, there are a variety of users (Power/Naive, Techie/ Financial/Clerical, ...), there are a variety of systems (Windows/NT/Mac/Unix, Client/Server, Portable, Distributed Object Manager, Web, ...), there are a variety of interactions (Real-time, Data Bases, Other Players, ...), and there are a variety of resources and goals (time, space, bandwidth, cost, security, quality, ...). To cope with such environments, the promise of information customization systems is becoming highly attractive. In this chapter we discuss important problems in relationship to such systems and smooth the way for possible solutions. The main idea is to approach information customization using a multi-agent paradigm. This chapter appears in the bo k, Web Mining: Applica ions and Techniques, edi ed by Anthony Scime. Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written permission of Idea Group Inc. is prohibited. 701 E. Chocolate Avenue, Suite 200, Hershey PA 17033-1240, USA Tel: 717/533-8845; Fax 717/533-8661; URL-http://www.idea-group.com IDEA GROUP PUBLISHING Extracting and Customizing Information Using Multi-Agents 229 Copyright © 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written permission of Idea Group Inc. is prohibited. INTRODUCTION The recent proliferation of personal computers and communication networks has a strong scientific, intellectual and social impact on the society. Using a computer network, geographically distributed people can communicate, coordinate, and collaborate their work efforts across time and space barriers. The quality of the results of exploiting this technology depends on how well individual knowledge can be communicated among the members; that is, how well members can gather the appropriate set of knowledge. The challenge faced is therefore how to turn the scattered, diverse knowledge available into a well-structured knowledge repository. The general framework for this is that of knowledge management, which is suggested as a methodology for creating, maintaining and exploiting a knowledge repository (Drucker et al., 1998; Liebowitz and Wilcox, 1997; Schreiber et al., 2000). The recent popularity of the World Wide Web (Web) has provided a tremendous opportunity to expedite the dispersement of various information creation/diffusion infrastructures. The mass of content available on the Web raises important questions over its effective use. With largely unstructured pages authored by a massive range of people on a diverse range of topics, simple browsing has given way to filtering as the practical way to manage Web-based information. Today’s online resources are therefore mainly accessible via a panoply of primitive but popular information services such as search engines. Search engines are very effective at filtering pages that match explicit queries. Unfortunately, most people find articulating what they want extremely difficult, especially if forced to use a limited vocabulary such as keywords. The result is large lists of search results that contain a handful of useful pages, defeating the purpose of filtering in the first place. Search engines also require massive memory resources (to store an index of the Web) and tremendous network bandwidth (to create and continually refresh the index). These systems receive millions of queries per day, and as a result, the CPU cycles devoted to satisfying each individual query are sharply curtailed. There is no time for intelligence. Furthermore, each query is independent of the previous one and no attempt is made to customize the responses to a particular individual. What is needed are systems that act on the user’s behalf and that can rely on existing information services that do the resource-intensive part of the work. These systems will be sufficiently lightweight to run on an average PC and serve as personal assistants. Since such an assistant has relatively modest resource requirements it can reside on an individual user’s machine, which facilitates customization to that individual. Furthermore, if the assistant resides on the user’s machine, there is no need to turn down intelligence. The system can have substantial local intelligence and information customization becomes possible. The work described here discusses some ideas aiming at improving the reduction of information overflow, which is so common today in Web search results. It should be understood in the broader framework of Web mining. Web mining includes the discovery of document content, hyperlink structure, access statistics and other interesting connections of information on the Web. It is interdisciplinary in nature, spanning across such fields as information retrieval, natural language processing, information extraction, machine learning, database, data mining, data warehousing, knowledge management, 23 more pages are available in the full version of this document, which may be purchased using the "Add to Cart" button on the publisher's webpage: www.igi-global.com/chapter/extracting-customizinginformation-using-multi/31141