Designing New Crawling and Indexing Techniques for Web Search Engines
暂无分享,去创建一个
[1] Mandar Mitra,et al. Information Retrieval from Documents: A Survey , 2000, Information Retrieval.
[2] Stephen E. Robertson,et al. Simple BM25 extension to multiple weighted fields , 2004, CIKM '04.
[3] Norman Abramson,et al. Information theory and coding , 1963 .
[4] Hector Garcia-Molina,et al. Effective page refresh policies for Web crawlers , 2003, TODS.
[5] J Patrick Bixler. Tracking text in mixed-mode documents , 2000, DOCPROCS '88.
[6] Filippo Menczer,et al. Evaluating topic-driven web crawlers , 2001, SIGIR '01.
[7] Erik Rauch,et al. A confidence-based framework for disambiguating geographic terms , 2003, HLT-NAACL 2003.
[8] Edward A. Fox,et al. ETANA-GIS: GIS for archaeological digital libraries , 2006, Proceedings of the 6th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL '06).
[9] Josef Kittler,et al. Pattern recognition : a statistical approach , 1982 .
[10] Rajeev Motwani,et al. The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.
[11] Serguei Levachkine,et al. Text/Graphics Separation and Recognition in Raster-Scanned Color Cartographic Maps , 2003, GREC.
[12] Thorsten Joachims,et al. Making large-scale support vector machine learning practical , 1999 .
[13] Marc Ehrig,et al. Ontology-focused crawling of Web documents , 2003, SAC '03.
[14] Marc Najork,et al. Mercator: A scalable, extensible Web crawler , 1999, World Wide Web.
[15] Alexandros Ntoulas,et al. Effective Change Detection Using Sampling , 2002, VLDB.
[16] David J. C. MacKay,et al. Information Theory, Inference, and Learning Algorithms , 2004, IEEE Transactions on Information Theory.
[17] Yoelle Maarek,et al. The Shark-Search Algorithm. An Application: Tailored Web Site Mapping , 1998, Comput. Networks.
[18] Prasenjit Mitra,et al. Automatic Extraction of Data from 2-D Plots in Documents , 2007 .
[19] Kevin S. McCurley,et al. Geospatial mapping and navigation of the web , 2001, WWW '01.
[20] Craig A. Knoblock,et al. Automatic extraction of road intersections from raster maps , 2005, GIS '05.
[21] Christopher Olston,et al. What's new on the web?: the evolution of the web from a search engine perspective , 2004, WWW '04.
[22] Kun Bai,et al. TableRank: A Ranking Algorithm for Table Search and Retrieval , 2007, AAAI.
[23] Junghoo Cho,et al. Impact of search engines on page popularity , 2004, WWW '04.
[24] Sougata Mukherjea,et al. WTMS: a system for collecting and analyzing topic-specific Web information , 2000, Comput. Networks.
[25] Kam-Fai Wong,et al. A retrospective study of a hybrid document-context based retrieval model , 2007, Inf. Process. Manag..
[26] Belur V. Dasarathy,et al. Nearest neighbor (NN) norms: NN pattern classification techniques , 1991 .
[27] Shlomo Moran,et al. Predictive caching and prefetching of query results in search engines , 2003, WWW '03.
[28] Anja Feldmann,et al. Rate of Change and other Metrics: a Live Study of the World Wide Web , 1997, USENIX Symposium on Internet Technologies and Systems.
[29] Kalina Bontcheva,et al. GATE: an Architecture for Development of Robust HLT applications , 2002, ACL.
[30] Ana Carolina Salgado,et al. Looking at both the present and the past to efficiently update replicas of web content , 2005, WIDM '05.
[31] Herbert Van de Sompel,et al. The open archives initiative: building a low-barrier interoperability framework , 2001, JCDL '01.
[32] Mike Thelwall,et al. Citation and hyperlink networks , 2005 .
[33] Marco Gori,et al. Towards Next Generation CiteSeer: A Flexible Architecture for Digital Library Deployment , 2006, ECDL.
[34] Michael E. Lesk,et al. Creating a searchable map library via data mining , 2008, JCDL '08.
[35] George Cybenko,et al. How dynamic is the Web? , 2000, Comput. Networks.
[36] Christos Faloutsos,et al. An Efficient Pictorial Database System for PSQL , 1988, IEEE Trans. Software Eng..
[37] Ron Sivan,et al. Web-a-where: geotagging web content , 2004, SIGIR '04.
[38] Patrice Enjalbert,et al. Geographic reference analysis for geographic document querying , 2003, HLT-NAACL 2003.
[39] Dragutin Petkovic,et al. Query by Image and Video Content: The QBIC System , 1995, Computer.
[40] Ravi Kumar,et al. Visualizing tags over time , 2007, ACM Trans. Web.
[41] Shih-Fu Chang,et al. Image Retrieval: Current Techniques, Promising Directions, and Open Issues , 1999, J. Vis. Commun. Image Represent..
[42] Marty Himmelstein. Local Search: The Internet Is the Yellow Pages , 2005, Computer.
[43] Mário J. Silva,et al. Challenges and resources for evaluating geographical IR , 2005, GIR '05.
[44] Kun Bai,et al. TableSeer: automatic table metadata extraction and searching in digital libraries , 2007, JCDL '07.
[45] Christos Faloutsos,et al. Sampling from large graphs , 2006, KDD '06.
[46] Hector Garcia-Molina,et al. Crawler-Friendly Web Servers , 2000, PERV.
[47] Philip S. Yu,et al. Optimal crawling strategies for web search engines , 2002, WWW '02.
[48] Geert-Jan Houben,et al. Information Retrieval in Distributed Hypertexts , 1994, RIAO.
[49] Jian-Kang Wu. Content-Based Indexing of Multimedia Databases , 1997, IEEE Trans. Knowl. Data Eng..
[50] Claudia Bauzer Medeiros,et al. Discovering geographic locations in web pages using urban addresses , 2007, GIR '07.
[51] Luis Gravano,et al. Exploiting Geographical Location Information of Web Pages , 1999, WebDB.
[52] Simone Santini,et al. Integrated browsing and querying for image databases , 2000, IEEE MultiMedia.
[53] R. A. Doney,et al. 4. Probability and Random Processes , 1993 .
[54] Costas Armenakis,et al. Survey of Work on Road Extraction in Aerial and Satellite Images , 2002 .
[55] James Ze Wang,et al. SIMPLIcity: Semantics-Sensitive Integrated Matching for Picture LIbraries , 2001, IEEE Trans. Pattern Anal. Mach. Intell..
[56] Jon Kleinberg,et al. Authoritative sources in a hyperlinked environment , 1999, SODA '98.
[57] Hector Garcia-Molina,et al. The Evolution of the Web and Implications for an Incremental Crawler , 2000, VLDB.
[58] John N. Tsitsiklis,et al. Introduction to Probability , 2002 .
[59] Dan Wu,et al. On assigning place names to geography related web pages , 2005, Proceedings of the 5th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL '05).
[60] Thorsten Joachims,et al. Optimizing search engines using clickthrough data , 2002, KDD.
[61] Edward A. Fox,et al. Automatic document metadata extraction using support vector machines , 2003, 2003 Joint Conference on Digital Libraries, 2003. Proceedings..
[62] Cheng Niu,et al. Location Normalization for Information Extraction , 2002, COLING.
[63] Hector Garcia-Molina,et al. Synchronizing a database to improve freshness , 2000, SIGMOD 2000.
[64] Mohammad Zubair,et al. Search engine coverage of the OAI-PMH corpus , 2006, IEEE Internet Computing.
[65] Christopher S. G. Khoo,et al. G-Portal: a map-based digital library for distributed geospatial and georeferenced resources , 2002, JCDL '02.
[66] Luis Gravano,et al. Computing Geographical Scopes of Web Resources , 2000, VLDB.
[67] Roy H. Campbell,et al. Internet search engine freshness by Web server help , 2001, Proceedings 2001 Symposium on Applications and the Internet.
[68] Edward A. Fox,et al. ETANA-ADD: an interactive tool for integrating archaeological DL collections , 2006, Proceedings of the 6th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL '06).
[69] C. Lee Giles,et al. Designing clustering-based web crawling policies for search engine crawlers , 2007, CIKM '07.
[70] Ross Wilkinson,et al. Effective retrieval of structured documents , 1994, SIGIR '94.
[72] C. Lee Giles,et al. Efficiently Detecting Webpage Updates Using Samples , 2007, ICWE.
[73] Hector Garcia-Molina,et al. Efficient Crawling Through URL Ordering , 1998, Comput. Networks.
[74] James P. Callan,et al. Combining document representations for known-item search , 2003, SIGIR.
[75] Essam A. El-Kwae,et al. Efficient content-based indexing of large image databases , 2000, TOIS.
[76] Jochen L. Leidner,et al. Grounding spatial named entities for information extraction and question answering , 2003, HLT-NAACL 2003.
[77] Vladimir N. Vapnik,et al. The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.
[78] Hanan Samet,et al. MAGELLAN: Map Acquisition of GEographic Labels by Legend ANalysis , 1998, International Journal on Document Analysis and Recognition.
[79] Andrew W. Moore,et al. X-means: Extending K-means with Efficient Estimation of the Number of Clusters , 2000, ICML.
[80] George Cybenko,et al. Keeping up with the changing Web , 2000, Computer.
[81] Edward A. Fox,et al. A Content-Based Image Retrieval Service for Archaeology Collections , 2006, ECDL.
[82] Sandip Debnath,et al. Learning metadata from the evidence in an on-line citation matching scheme , 2006, Proceedings of the 6th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL '06).
[83] Sandeep Pandey,et al. User-centric Web crawling , 2005, WWW '05.
[84] Marc Najork,et al. A large‐scale study of the evolution of Web pages , 2004, Softw. Pract. Exp..
[85] Gerard Salton,et al. A vector space model for automatic indexing , 1975, CACM.
[86] Sung-Hyon Myaeng,et al. A flexible model for retrieval of SGML documents , 1998, SIGIR '98.
[87] C. Lee Giles,et al. Extraction and search of chemical formulae in text documents on the web , 2007, WWW '07.
[88] Cheng Niu,et al. InfoXtract: A Customizable Intermediate Level Information Extraction Engine , 2003, Natural Language Engineering.
[89] Marcel Worring,et al. Content-Based Image Retrieval at the End of the Early Years , 2000, IEEE Trans. Pattern Anal. Mach. Intell..
[90] M. Sanderson,et al. Analyzing geographic queries , 2004 .
[91] Martin van den Berg,et al. Focused Crawling: A New Approach to Topic-Specific Web Resource Discovery , 1999, Comput. Networks.
[92] Philip S. Yu,et al. Intelligent crawling on the World Wide Web with arbitrary predicates , 2001, WWW '01.
[93] Guoray Cai. GeoVIBE: A Visual Interface for Geographic Digital Libraries , 2002, Visual Interfaces to Digital Libraries.
[94] Mounia Lalmas. Uniform Representation of Content and Structure for structured document retrieval , 2001 .
[95] King-Sun Fu,et al. Query-by-Pictorial-Example , 1980, IEEE Trans. Software Eng..
[96] Ingemar J. Cox,et al. The Bayesian image retrieval system, PicHunter: theory, implementation, and psychophysical experiments , 2000, IEEE Trans. Image Process..
[97] Ioannis A. Kakadiaris,et al. Understanding diagrams in technical documents , 1992, Computer.
[98] William C. Schefler,et al. Statistics: Concepts and Applications , 1988 .
[99] G Salton,et al. Developments in Automatic Text Retrieval , 1991, Science.
[100] Hyun Chul Lee,et al. Geographically-Sensitive Link Analysis , 2007, IEEE/WIC/ACM International Conference on Web Intelligence (WI'07).
[101] Jenny Edwards,et al. An adaptive model for optimizing performance of an incremental web crawler , 2001, WWW '01.
[102] Sandeep Pandey,et al. Recrawl scheduling based on information longevity , 2008, WWW.
[103] Sergey Brin,et al. The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.
[104] Michael Stonebraker,et al. Chabot: Retrieval from a Relational Database of Images , 1995, Computer.
[105] Chew Lim Tan,et al. Text/Graphics Separation in Maps , 2001, GREC.
[106] Judit Bar-Ilan,et al. Methods for comparing rankings of search engine results , 2005, Comput. Networks.
[107] Cheng Niu,et al. InfoXtract: a customizable intermediate level information extraction engine , 2003, HLT-NAACL 2003.
[108] Clement T. Yu,et al. Techniques and Systems for Image and Video Retrieval , 1999, IEEE Trans. Knowl. Data Eng..
[109] C. Lee Giles,et al. Classification of source code archives , 2003, SIGIR '03.
[110] Monika Henzinger,et al. Analysis of a very large web search engine query log , 1999, SIGF.
[111] José Luis Borbinha,et al. Geographically-aware information retrieval for collections of digitized historical maps , 2007, GIR '07.
[112] Hanan Samet,et al. MARCO: MAp Retrieval by COntent , 1996, IEEE Trans. Pattern Anal. Mach. Intell..
[113] Hugh E. Williams,et al. What's Changed? Measuring Document Change in Web Crawling for Search Engines , 2003, SPIRE.
[114] James Ze Wang,et al. Automatic categorization of figures in scientific documents , 2006, Proceedings of the 6th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL '06).
[115] Gregory R. Crane,et al. Disambiguating Geographic Names in a Historical Digital Library , 2001, ECDL.
[116] Yang Song,et al. CiteSeerχ: a scalable autonomous scientific digital library , 2006, InfoScale '06.
[117] George Karypis,et al. Fast supervised dimensionality reduction algorithm with applications to document categorization & retrieval , 2000, CIKM '00.
[118] C. Lee Giles,et al. Digital Libraries and Autonomous Citation Indexing , 1999, Computer.