Recent results in automatic Web resource discovery

Classical information retrieval (IR) is concerned with indexing a collection of documents and answering queries by returning a ranked list of relevant documents [14, 21, 24]. With the growth of the web, the problems of ambiguity, context sensitivity, synonymy (two terms with the same meaning) and polysemy (one term with different meanings) that are inherent in natural languages, together with the abundance of web pages related to prominent topics, have exacerbated the difficulty of fulfilling the user’s information need. Most search sites have added directory-based topic browsing. The web is organized as a tree of topics, similar to the Dewey decimal system, the Library of Congress catalog, or the US Patent and Trademarks Office subject codes. Tree nodes are maintained by paid ontologists and/or specialist volunteers, such as at Yahoo!, The Mining Co., WWW Virtual Library, and Open Directory Project. This strategy may be biased because of sparsity of experts; at any rate it is biased away from the most accomplished and busiest people.

[1]  Craig Silverstein,et al.  Analysis of a Very Large Altavista Query Log" SRC Technical note #1998-14 , 1998 .

[2]  Jon M. Kleinberg,et al.  Mining the Web's Link Structure , 1999, Computer.

[3]  Jon M. Kleinberg,et al.  Automatic Resource Compilation by Analyzing Hyperlink Structure and Associated Text , 1998, Comput. Networks.

[4]  Prabhakar Raghavan,et al.  Scalable feature selection, classification and signature generation for organizing large text databases into hierarchical topic taxonomies , 1998, The VLDB Journal.

[5]  S. Wasserman,et al.  Social Network Analysis: Computer Programs , 1994 .

[6]  Stanley Wasserman,et al.  Social Network Analysis: Methods and Applications , 1994 .

[7]  Andrei Z. Broder,et al.  A Technique for Measuring the Relative Size and Overlap of Public Web Search Engines , 1998, Comput. Networks.

[8]  Eric W. Brown,et al.  Execution performance issues in full-text information retrieval , 1995 .

[9]  L. R. Rasmussen,et al.  In information retrieval: data structures and algorithms , 1992 .

[10]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[11]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[12]  Ray R. Larson,et al.  Bibliometrics of the World Wide Web: An Exploratory Analysis of the Intellectual Structure of Cyberspace , 1996 .

[13]  Soumen Chakrabarti,et al.  Distributed Hypertext Resource Discovery Through Examples , 1999, VLDB.

[14]  Susan T. Dumais,et al.  Inductive learning algorithms and representations for text categorization , 1998, CIKM '98.

[15]  AgrawalRakesh,et al.  Scalable feature selection, classification and signature generation for organizing large text databases into hierarchical topic taxonomies , 1998, VLDB 1998.

[16]  Andrei Z. Broder,et al.  The Connectivity Server: Fast Access to Linkage Information on the Web , 1998, Comput. Networks.

[17]  M. Mizruchi,et al.  Techniques for Disaggregating Centrality Scores in Social Networks , 1986 .

[18]  Martin van den Berg,et al.  Focused Crawling: A New Approach to Topic-Specific Web Resource Discovery , 1999, Comput. Networks.

[19]  Monika Henzinger,et al.  Finding Related Pages in the World Wide Web , 1999, Comput. Networks.

[20]  Krishna Bharat,et al.  Improved algorithms for topic distillation in a hyperlinked environment , 1998, SIGIR '98.

[21]  Daphne Koller,et al.  Hierarchically Classifying Documents Using Very Few Words , 1997, ICML.

[22]  Piotr Indyk,et al.  Enhanced hypertext categorization using hyperlinks , 1998, SIGMOD '98.

[23]  John Scott What is social network analysis , 2010 .

[24]  C. Lee Giles,et al.  Accessibility of information on the Web , 2000, INTL.