CiteSeer: an autonomous Web agent for automatic retrieval and identification of interesting publications

Research papers available on the World Wide Web (WWW or Web) areoften poorly organized, often exist in forms opaque to searchengines (e.g. Postscript), and increase in quantity daily.Significant amounts of time and effort are typically needed inorder to find interesting and relevant publications on the Web. Wehave developed a Web based information agent that assists the userin the process of performing a scientific literature search. Givena set of keywords, the agent uses Web search engines and heuristicsto locate and download papers. The papers are parsed in order toextract information features such as the abstract and individuallyidentified citations. The agents Web interface can be used to findrelevant papers in the database using keyword searches, or bynavigating the links between papers formed by the citations. Linksto both citing and cited publications can be followed. In additionto simple browsing and keyword searches, the agent can find paperswhich are similar to a given paper using word information and byanalyzing common citations made by the papers.

[1]  Michael I. Jordan,et al.  Advances in Neural Information Processing Systems 30 , 1995 .

[2]  Alexander H. Waibel,et al.  The Tempo 2 Algorithm: Adjusting Time-Delays By Supervised Learning , 1990, NIPS.

[3]  M. F. Porter,et al.  An algorithm for suffix stripping , 1997 .

[4]  Eugene Garfield,et al.  The concept of citation indexing: a unique and innovative tool for navigating the research literatur , 1994 .

[5]  Alexandros Moukas Amalthaea Information Discovery and Filtering Using a Multiagent Evolving Ecosystem , 1997, Appl. Artif. Intell..

[6]  Ellen Spertus,et al.  ParaSite: Mining Structural Information on the Web , 1997, Comput. Networks.

[7]  Peter N. Yianilos,et al.  Data structures and algorithms for nearest neighbor search in general metric spaces , 1993, SODA '93.

[8]  C. L. Giles,et al.  Machine learning using higher order correlation networks , 1986 .

[9]  M. Goudreau,et al.  First-order vs. Second-order Single Layer Recurrent Neural Networks , 1994 .

[10]  Gerard Salton,et al.  On the Specification of Term Values in Automatic Indexing , 1973 .

[11]  Gerard Salton,et al.  AUTOMATIC INDEXING USING BIBLIOGRAPHIC CITATIONS , 1971 .

[12]  Marko Balabanovic,et al.  An adaptive Web page recommendation service , 1997, AGENTS '97.

[13]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[14]  Robert D. Cameron,et al.  A Universal Citation Database as a Catalyst for Reform in Scholarly Communication , 1997, First Monday.

[15]  Michael J. Pazzani,et al.  Syskill & Webert: Identifying Interesting Web Sites , 1996, AAAI/IAAI, Vol. 1.

[16]  S. Lazerow The Institute of Scientific Information , 1961, Nature.

[17]  Peter Edwards,et al.  Exploiting learning technologies for World Wide Web agents , 1997 .

[18]  Peter M. Todd,et al.  Modeling the Perception of Tonal Structure with Neural Nets , 1989 .

[19]  Mark S. Ackerman,et al.  Do I Care? -- Tell Me What's Changed on the Web , 1996 .

[20]  C. Lee Giles,et al.  Extracting and Learning an Unknown Grammar with Recurrent Neural Networks , 1991, NIPS.

[21]  Filippo Menczer,et al.  ARCCHNID: Adaptive Retrieval Agents Choosing Heuristic Neighborhoods , 1997, ICML.

[22]  Leon Sterling,et al.  CIFI: An Intelligent Agent for Citation Finding on The World-wide Web , 1996, PRICAI.

[23]  C. Lee Giles,et al.  Learning and Extracting Finite State Automata with Second-Order Recurrent Neural Networks , 1992, Neural Computation.

[24]  C. Lee Giles,et al.  Higher Order Recurrent Networks and Grammatical Inference , 1989, NIPS.

[25]  Roger B. Dannenberg,et al.  Tracking Musical Beats in Real Time , 1990, ICMC.

[26]  Filippo Menczer,et al.  ARACHNID: Adaptive Retrieval Agents Choosing Heuristic Neighborhoods for Information Discovery , 1997, ICML 1997.

[27]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[28]  Peter J. Beek,et al.  Frequency locking, frequency modulation, and bifurcations in dynamic movement systems , 1992 .

[29]  SaltonGerard,et al.  Term-weighting approaches in automatic text retrieval , 1988 .