Text-learning and related intelligent agentsDunja

Analysis of text data using intelligent information retrieval, machine learning, natural language processing or other related methods is becoming an important issue for the development of intelligent agents. There are two frequently used approaches to the development of intelligent agents using machine learning techniques: a content-based and a collaborative approach. In the rst approach, the content (eg., text) plays an important role, while in the second approach, the existence of several knowledge sources (eg., several users) is required. We can say that the usage of machine learning techniques on text databases (usually referred to as text-learning) is an important part of the content-based approach. Examples are agents for locating information on World Wide Web and Usenet news ltering agents. There are diierent research questions important for the development of text-learning intelligent agents. We focus on three of them: what representation is used for documents, how is the high number of features dealt with and which learning algorithm is used. These questions are addressed in an overview of the existing approaches to text classiication. For illustration we give a brief description of the content-based personal intelligent agent named Personal WebWatcher that uses text-learning for user customized Web browsing.

[1]  Dan Ionescu,et al.  A Learning Agent that Assists the Browsing of Software Libraries , 2000, IEEE Trans. Software Eng..

[2]  Pattie Maes,et al.  Agents that reduce work and information overload , 1994, CACM.

[3]  David D. Lewis,et al.  A comparison of two learning algorithms for text categorization , 1994 .

[4]  Humphrey Sorensen,et al.  PSUN: A Profiling System for Usenet News , 1995, CIKM Information Agents Workshop.

[5]  Sholom M. Weiss,et al.  Towards language independent automated learning of text categorization models , 1994, SIGIR '94.

[6]  Filippo Menczer,et al.  ARACHNID: Adaptive Retrieval Agents Choosing Heuristic Neighborhoods for Information Discovery , 1997, ICML 1997.

[7]  Tomonari Kamba,et al.  ANATAGONOMY: a personalized newspaper on the World Wide Web , 1997, Int. J. Hum. Comput. Stud..

[8]  Henry Lieberman,et al.  Letizia: An Agent That Assists Web Browsing , 1995, IJCAI.

[9]  Garrison W. Cottrell,et al.  Latent semantic indexing is an optimal special case of multidimensional scaling , 1992, SIGIR '92.

[10]  Wai Lam,et al.  Using a Bayesian Network Induction Approach for Text Categorization , 1997, IJCAI.

[11]  Ken Lang,et al.  NewsWeeder: Learning to Filter Netnews , 1995, ICML.

[12]  James Rucker,et al.  Siteseer: personalized navigation for the Web , 1997, CACM.

[13]  hierarchyDunja Mladeni,et al.  Feature Selection for Classiication Based on Text Hierarchy , 1998 .

[14]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[15]  James P. Callan,et al.  Training algorithms for linear text classifiers , 1996, SIGIR '96.

[16]  Yiming Yang,et al.  Expert network: effective and efficient learning from human decisions in text categorization and retrieval , 1994, SIGIR '94.

[17]  Wai Lam,et al.  Using a generalized instance set for automatic text categorization , 1998, SIGIR '98.

[18]  Kristian J. Hammond,et al.  Knowledge-based information retrieval from semi-structured text , 1996 .

[19]  Susan T. Dumais,et al.  Using Linear Algebra for Intelligent Information Retrieval , 1995, SIAM Rev..

[20]  Oren Etzioni,et al.  A softbot-based interface to the Internet , 1994, CACM.

[21]  Michael J. Pazzani,et al.  Syskill & Webert: Identifying Interesting Web Sites , 1996, AAAI/IAAI, Vol. 1.

[22]  Robert C. Holte,et al.  A Learning Apprentice For Browsing , 1994 .

[23]  Yoav Shoham,et al.  Learning Information Retrieval Agents: Experiments with Automated Web Browsing , 1995 .

[24]  William A. Gale,et al.  A sequential algorithm for training text classifiers , 1994, SIGIR '94.

[25]  Kristian J. Hammond,et al.  A Case-Based Approach to Knowledge Navigation , 1994, IJCAI.

[26]  Dunja Mladenic,et al.  Feature Subset Selection in Text-Learning , 1998, ECML.

[27]  Thorsten Joachims,et al.  A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization , 1997, ICML.

[28]  T DumaisSusan,et al.  Using linear algebra for intelligent information retrieval , 1995 .

[29]  David L. Waltz,et al.  Trading MIPS and memory for knowledge engineering , 1992, CACM.

[30]  Isabelle Moulinier,et al.  Applying an existing machine learning algorithm to text categorization , 1995, Learning for Natural Language Processing.

[31]  Dunja Mladenic,et al.  Word sequences as features in text-learning , 1998 .

[32]  Thorsten Joachims,et al.  WebWatcher : A Learning Apprentice for the World Wide Web , 1995 .

[33]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[34]  Tina Eliassi-Rad,et al.  Building Intelligent Agents for Web-Based Tasks: A Theory-Refinement Approach , 1998 .

[35]  Bruce Krulwich,et al.  The ContactFinder Agent: Answering Bulletin Board Questions with Referrals , 1996, AAAI/IAAI, Vol. 1.

[36]  Thorsten Joachims,et al.  WebWatcher: Machine Learning and Hypertext , 1995 .

[37]  Yiming Yang,et al.  A Comparative Study on Feature Selection in Text Categorization , 1997, ICML.

[38]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[39]  Yoav Shoham,et al.  Fab: content-based, collaborative recommendation , 1997, CACM.

[40]  Susan T. Dumais,et al.  Personalized information delivery: an analysis of information filtering methods , 1992, CACM.

[41]  Loren Terveen,et al.  PHOAKS: a system for sharing recommendations , 1997, CACM.

[42]  Dunja Mladenic,et al.  Machine Learning on non-homogeneous, distributed text data , 1998 .

[43]  Bart Selman,et al.  Referral Web: combining social networks and collaborative filtering , 1997, CACM.

[44]  Tom M. Mitchell,et al.  Experience with a learning personal assistant , 1994, CACM.

[45]  Jaime G. Carbonell,et al.  Report on the CONALD Workshop on Learning from Text and the Web , 1998 .

[46]  Andreas S. Weigend,et al.  A neural network approach to topic spotting , 1995 .

[47]  Sara Reese Hedberg Agents for sale: first wave of intelligent agents go commercial , 1996, IEEE Expert.

[48]  Gerard Salton,et al.  The SMART Retrieval System—Experiments in Automatic Document Processing , 1971 .

[49]  Claudia V. Goldman,et al.  Musag an Agent That Learns What You Mean , 1997, Appl. Artif. Intell..

[50]  William F. Punch,et al.  Automated Concept Extraction From Plain Text , 1998 .