Web Information Resource Discovery: Past, Present, and Future

In a time span of twelve years, the World Wide Web–only a computer and an internet connection away from anybody anywhere, and with abundant, diverse and sometimes incorrect, redundant, spam, and bad information–has become the major information repository for the masses and the world. The web is becoming all things to all people, totally oblivious to nation/country/continent boundaries, promising mostly free information to all, and quickly growing into a repository in all languages and all cultures. With large digital libraries and increasingly significant educational resources, the web is becoming an equalizer, a balancing force, and an opportunity for all, especially for underdeveloped/developing countries. The web is both exciting and overwhelming, changing the way the world communicates, from the way businesses are conducted to the way masses are educated, from the way research is performed to the way research results are disseminated. It is fair to say that the web will only get more diverse, larger and more chaotic in the near future.

[1]  Gerald Salton,et al.  Automatic text processing , 1988 .

[2]  Ralph Grishman,et al.  Real-time event extraction for infectious disease outbreaks , 2002 .

[3]  Luis Gravano,et al.  Querying text databases for efficient information extraction , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[4]  Nicola Guarino,et al.  Formal Ontology and Information Systems , 1998 .

[5]  Shlomo Moran,et al.  SALSA: the stochastic approach for link-structure analysis , 2001, TOIS.

[6]  Thomas R. Gruber,et al.  A translation approach to portable ontology specifications , 1993, Knowl. Acquis..

[7]  Thomas R. Gruber,et al.  A Translation Approach to Portable Ontologies , 1993 .

[8]  Michael I. Jordan,et al.  Stable algorithms for link analysis , 2001, SIGIR '01.

[9]  Jon M. Kleinberg,et al.  Mining the Web's Link Structure , 1999, Computer.

[10]  Luis Gravano,et al.  Extracting Relations from Large Plain-Text Collections , 1999 .

[11]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[12]  Soumen Chakrabarti,et al.  Mining the web - discovering knowledge from hypertext data , 2002 .

[13]  Marco Gori,et al.  Focused Crawling Using Context Graphs , 2000, VLDB.

[14]  Mark Needleman,et al.  The W3C Semantic Web Activity , 2003 .

[15]  Ian Horrocks,et al.  Adding formal semantics to the Web: building on top of RDF Schema. , 2000 .

[16]  Filippo Menczer,et al.  Evaluating topic-driven web crawlers , 2001, SIGIR '01.

[17]  Yoelle Maarek,et al.  The Shark-Search Algorithm. An Application: Tailored Web Site Mapping , 1998, Comput. Networks.

[18]  Ralph Grishman,et al.  Information Extraction: Techniques and Challenges , 1997, SCIE.

[19]  Sergey Brin,et al.  Extracting Patterns and Relations from the World Wide Web , 1998, WebDB.

[20]  Steffen Staab,et al.  The Ontology Inference Layer OIL , 2000 .

[21]  Luis Gravano,et al.  Combining Strategies for Extracting Relations from Text Collections , 2000, ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery.

[22]  Surithong Srisa‐ard,et al.  Mining the Web: Discovering Knowledge from Hypertext Data , 2003 .

[23]  Martin van den Berg,et al.  Focused Crawling: A New Approach to Topic-Specific Web Resource Discovery , 1999, Comput. Networks.

[24]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[25]  Dan Brickley,et al.  Resource Description Framework (RDF) Model and Syntax Specification , 2002 .

[26]  Hector Garcia-Molina,et al.  Efficient Crawling Through URL Ordering , 1998, Comput. Networks.

[27]  Andreas Eberhart,et al.  Survey of RDF data on the Web Technical Report , 2002 .

[28]  Krishna Bharat,et al.  Improved algorithms for topic distillation in a hyperlinked environment , 1998, SIGIR '98.

[29]  Luis Gravano,et al.  Snowball: extracting relations from large plain-text collections , 2000, DL '00.

[30]  Reinier Post,et al.  Information Retrieval in the World-Wide Web: Making Client-Based Searching Feasible , 1994, Comput. Networks ISDN Syst..

[31]  Marc Najork,et al.  Breadth-First Search Crawling Yields High-Quality Pages , 2001 .