Web Retrieval and Mining

The advent of the Web in the mid-1990s followed by its fast adoption in a relatively short time, posed significant challenges to classical information retrieval methods developed in the 1970s and the 1980s. The major challenges include that the Web is massive, dynamic, and distributed. The two main types of tasks that are carried on the Web are searching and mining. Searching is locating information given an information need, and mining is extracting information and/or knowledge from a corpus. The metrics for success when carrying these tasks on the Web include precision, recall (completeness), freshness, and efficiency

[1]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[2]  Ian Witten,et al.  Data Mining , 2000 .

[3]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[4]  Marc Najork,et al.  Detecting phrase-level duplication on the world wide web , 2005, SIGIR '05.

[5]  Susan T. Dumais,et al.  Hierarchical classification of Web content , 2000, SIGIR '00.

[6]  Hsinchun Chen,et al.  Knowledge Management Systems: A Text Mining Perspective , 2001 .

[7]  Jon Kleinberg,et al.  Authoritative sources in a hyperlinked environment , 1999, SODA '98.

[8]  Eric Brill,et al.  Beyond PageRank: machine learning for static ranking , 2006, WWW '06.

[9]  Brian D. Davison Topical locality in the Web , 2000, SIGIR '00.

[10]  Bo Pang,et al.  Thumbs up? Sentiment Classification using Machine Learning Techniques , 2002, EMNLP.

[11]  Sriram Raghavan,et al.  Searching the Web , 2001, ACM Trans. Internet Techn..

[12]  Hsinchun Chen Machine learning for information retrieval: neural networks, symbolic learning, and genetic algorithms , 1995 .

[13]  Allan Borodin,et al.  Link analysis ranking: algorithms, theory, and experiments , 2005, TOIT.

[14]  Khaled Shaalan,et al.  A Survey of Web Information Extraction Systems , 2006, IEEE Transactions on Knowledge and Data Engineering.

[15]  Stephanie W. Haas,et al.  Page and link classifications: connecting diverse resources , 1998, DL '98.

[16]  Hector Garcia-Molina,et al.  Web Spam Taxonomy , 2005, AIRWeb.

[17]  Koichi Takeda,et al.  Information retrieval on the web , 2000, CSUR.

[18]  Ian H. Witten,et al.  The bubble of web visibility , 2005, CACM.

[19]  Rajeev Motwani,et al.  Stratified Planning , 2009, IJCAI.

[20]  Prabhakar Raghavan,et al.  Scalable feature selection, classification and signature generation for organizing large text databases into hierarchical topic taxonomies , 1998, The VLDB Journal.

[21]  Vibhu O. Mittal,et al.  The Happy Searcher: Challenges in Web Information Retrieval , 2004, PRICAI.

[22]  Ricardo A. Baeza-Yates,et al.  Applications of Web Query Mining , 2005, ECIR.

[23]  Geoffrey Zweig,et al.  Syntactic Clustering of the Web , 1997, Comput. Networks.

[24]  Sriram Raghavan,et al.  Crawling the Hidden Web , 2001, VLDB.

[25]  Andrei Z. Broder,et al.  Graph structure in the Web , 2000, Comput. Networks.

[26]  Gerald Salton,et al.  Automatic text processing , 1988 .

[27]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[28]  Hector Garcia-Molina,et al.  Performance of inverted indices in shared-nothing distributed text document information retrieval systems , 1993, [1993] Proceedings of the Second International Conference on Parallel and Distributed Information Systems.

[29]  Christopher Olston,et al.  What's new on the web?: the evolution of the web from a search engine perspective , 2004, WWW '04.