Collecte orientée sur le Web pour la recherche d'information spécialisée. (Focused document gathering on the Web for domain-specific information retrieval)
暂无分享,去创建一个
[1] Yasuhiko Kitamura,et al. Keyword Spices: A New Method for Building Domain-Specific Web Search Engines , 2001, IJCAI.
[2] Hector Garcia-Molina,et al. Efficient Crawling Through URL Ordering , 1998, Comput. Networks.
[3] Inderjit S. Dhillon,et al. Co-clustering documents and words using bipartite spectral graph partitioning , 2001, KDD '01.
[4] Luis Gravano,et al. Snowball: extracting relations from large plain-text collections , 2000, DL '00.
[5] William H. Fletcher,et al. Concordancing the Web with KWiCFinder , 2001 .
[6] Mounia Lalmas,et al. Workshop on aggregated search , 2008, SIGF.
[7] Gerard Salton,et al. Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..
[8] Ricardo A. Baeza-Yates,et al. Crawling a country: better strategies than breadth-first for web page ordering , 2005, WWW '05.
[9] Zhen Liu,et al. Optimal Robot Scheduling for Web Search Engines , 1998 .
[10] Marco Baroni,et al. Building general- and special-purpose corpora by Web crawling , 2006 .
[11] Parikshit Sondhi,et al. Using query context models to construct topical search engines , 2010, IIiX.
[12] Lakshminarayanan Subramanian,et al. Contextual Information Portals , 2010, AAAI Spring Symposium: Artificial Intelligence for Development.
[13] Jianbo Shi,et al. A Random Walks View of Spectral Segmentation , 2001, AISTATS.
[14] Zhaohui Zheng,et al. Learning to model relatedness for news recommendation , 2011, WWW.
[15] Yong Yu,et al. Identifying ambiguous queries in web search , 2007, WWW '07.
[16] David Hawking,et al. Quality and relevance of domain-specific search: A case study in mental health , 2006, Information Retrieval.
[17] Iadh Ounis,et al. A study of the dirichlet priors for term frequency normalisation , 2005, SIGIR '05.
[18] Antoinette Renouf,et al. WebCorp: an integrated system for web text search , 2007 .
[19] Fernando Diaz,et al. Sources of evidence for vertical selection , 2009, SIGIR.
[20] Alexander Mehler,et al. Genres on the Web: Computational Models and Empirical Studies , 2010 .
[21] Rada Mihalcea,et al. Random-Walk Term Weighting for Improved Text Classification , 2006, International Conference on Semantic Computing (ICSC 2007).
[22] Peter Norvig,et al. The Unreasonable Effectiveness of Data , 2009, IEEE Intelligent Systems.
[23] M. Newman. Analysis of weighted networks. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.
[24] Carl Lagoze,et al. Focused Crawls, Tunneling, and Digital Libraries , 2002, ECDL.
[25] Filippo Menczer,et al. A General Evaluation Framework for Topical Crawlers , 2005, Information Retrieval.
[26] Filippo Menczer,et al. Topical Crawling for Business Intelligence , 2003, ECDL.
[27] Hang Li. Learning to Rank for Information Retrieval and Natural Language Processing , 2011, Synthesis Lectures on Human Language Technologies.
[28] Chih-Jen Lin,et al. Dual coordinate descent methods for logistic regression and maximum entropy models , 2011, Machine Learning.
[29] Marco Gori,et al. Focused Crawling Using Context Graphs , 2000, VLDB.
[30] Ophir Frieder,et al. Predicting query difficulty on the web by learning visual clues , 2005, SIGIR '05.
[31] Kristian J. Hammond,et al. Watson: Anticipating and Contextualizing Information Needs , 1999 .
[32] M. Narasimha Murty,et al. Focused crawling with scalable ordinal regression solvers , 2007, ICML '07.
[33] Sergey Brin,et al. Extracting Patterns and Relations from the World Wide Web , 1998, WebDB.
[34] Padmini Srinivasan,et al. Link Contexts in Classifier-Guided Topical Crawlers , 2006, IEEE Trans. Knowl. Data Eng..
[35] Chun Chen,et al. Guide focused crawler efficiently and effectively using on-line topical importance estimation , 2008, SIGIR '08.
[36] Thorsten Brants,et al. Large Language Models in Machine Translation , 2007, EMNLP.
[37] Taher H. Haveliwala. Topic-Sensitive PageRank: A Context-Sensitive Ranking Algorithm for Web Search , 2003, IEEE Trans. Knowl. Data Eng..
[38] P. Diaconis. Group representations in probability and statistics , 1988 .
[39] Burr Settles,et al. Active Learning Literature Survey , 2009 .
[40] Jialun Qin,et al. Building domain-specific Web collections for scientific digital libraries: a meta-search enhanced focused crawling method , 2004, Proceedings of the 2004 Joint ACM/IEEE Conference on Digital Libraries, 2004..
[41] Sanjay Ghemawat,et al. MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.
[42] Ellen Riloff,et al. Learning Dictionaries for Information Extraction by Multi-Level Bootstrapping , 1999, AAAI/IAAI.
[43] Fabrizio Sebastiani,et al. Machine learning in automated text categorization , 2001, CSUR.
[44] Peter Fankhauser,et al. Boilerplate detection using shallow text features , 2010, WSDM '10.
[45] Michael Kluck,et al. Evaluation of Cross-Language Information Retrieval Using the Domain-Specific GIRT Data as Parallel German-English Corpus , 2004, LREC.
[46] Marc Najork,et al. Detecting spam web pages through content analysis , 2006, WWW '06.
[47] Michele Banko,et al. Scaling to Very Very Large Corpora for Natural Language Disambiguation , 2001, ACL.
[48] Martin van den Berg,et al. Focused Crawling: A New Approach to Topic-Specific Web Resource Discovery , 1999, Comput. Networks.
[49] Stefan Evert. A Lightweight and Efficient Tool for Cleaning Web Pages , 2008, LREC.
[50] Kyo Kageura,et al. METHODS OF AUTOMATIC TERM RECOGNITION : A REVIEW , 1996 .
[51] Serge Sharo. Creating General-Purpose Corpora Using Automated Search Engine Queries , 2006 .
[52] Clément de Groc,et al. Experiments on Pseudo Relevance Feedback Using Graph Random Walks , 2012, SPIRE.
[53] Philip S. Yu,et al. Building text classifiers using positive and unlabeled examples , 2003, Third IEEE International Conference on Data Mining.
[54] Ziv Bar-Yossef,et al. Random sampling from a search engine's index , 2006, WWW '06.
[55] W. Knight. A Computer Method for Calculating Kendall's Tau with Ungrouped Data , 1966 .
[56] Rayid Ghani,et al. Learning a monolingual language model from a multilingual text database , 2000, CIKM '00.
[57] Ayman Farahat,et al. Authority Rankings from HITS, PageRank, and SALSA: Existence, Uniqueness, and Effect of Initialization , 2005, SIAM J. Sci. Comput..
[58] Chun Chen,et al. Quantify query ambiguity using ODP metadata , 2007, SIGIR.
[59] Marc Najork,et al. Spam, damn spam, and statistics: using statistical analysis to locate spam web pages , 2004, WebDB '04.
[60] William H. Fletcher. Making the Web More Useful as a Source for Linguistic Corpora , 2004 .
[61] Moni Naor,et al. Rank aggregation methods for the Web , 2001, WWW '01.
[62] Silvia Bernardini,et al. BootCaT: Bootstrapping Corpora and Terms from the Web , 2004, LREC.
[63] Jimmy J. Lin,et al. Book Reviews: Data-Intensive Text Processing with MapReduce by Jimmy Lin and Chris Dyer , 2010, CL.
[64] Milad Shokouhi,et al. Federated Search , 2011, Found. Trends Inf. Retr..
[65] Geoffrey Williams,et al. METRICC: Harnessing comparable corpora for multilingual lexicon development , 2012 .
[66] Jason Renniey,et al. Eecient Web Spidering with Reinforcement Learning , 1999 .
[67] Clément de Groc,et al. GrawlTCQ: Terminology and Corpora Building by Ranking Simultaneously Terms, Queries and Documents using Graph Random Walks , 2011, Graph-based Methods for Natural Language Processing.
[68] Silvia Bernardini,et al. Introducing and evaluating ukWaC , a very large web-derived corpus of English , 2008 .
[69] Robert Steele,et al. Techniques for specialized search engines , 2001 .
[70] W. Bruce Croft,et al. Linear feature-based models for information retrieval , 2007, Information Retrieval.
[71] Barry Smyth,et al. Fact or Fiction: Content Classification for Digital Libraries , 2001, DELOS.
[72] Amy Nicole Langville,et al. A Survey of Eigenvector Methods for Web Information Retrieval , 2005, SIAM Rev..
[73] Dawid Weiss,et al. A survey of Web clustering engines , 2009, CSUR.
[74] Christopher D. Manning,et al. Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..
[75] Serge Abiteboul,et al. Adaptive on-line page importance computation , 2003, WWW '03.
[76] Milad Shokouhi,et al. From federated to aggregated search , 2010, SIGIR.
[77] Soumen Chakrabarti,et al. Accelerated focused crawling through online relevance feedback , 2002, WWW.
[78] Silvia Bernardini,et al. The WaCky wide web: a collection of very large linguistically processed web-crawled corpora , 2009, Lang. Resour. Evaluation.
[79] Jenny Edwards,et al. An adaptive model for optimizing performance of an incremental web crawler , 2001, WWW '01.
[80] W. Bruce Croft,et al. Predicting query performance , 2002, SIGIR '02.
[81] Rayid Ghani,et al. Building Minority Language Corpora by Learning to Generate Web Search Queries , 2003, Knowledge and Information Systems.
[82] Bo Pang,et al. Seeing Stars: Exploiting Class Relationships for Sentiment Categorization with Respect to Rating Scales , 2005, ACL.
[83] William W. Cohen,et al. Language-Independent Set Expansion of Named Entities Using the Web , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).
[84] Clément de Groc,et al. Mining Product Features from the Web: A Self-supervised Approach , 2012, WEBIST.
[85] Cristina V. Lopes,et al. Bagging gradient-boosted trees for high precision, low variance ranking models , 2011, SIGIR.
[86] Rajeev Motwani,et al. The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.
[87] Martin F. Porter,et al. An algorithm for suffix stripping , 1997, Program.
[88] Sandeep Pandey,et al. Crawl ordering by search impact , 2008, WSDM '08.
[89] Eli Upfal,et al. The Web as a graph , 2000, PODS.
[90] Qiang Wu,et al. Adapting boosting for information retrieval measures , 2010, Information Retrieval.
[91] Claudia Hauff,et al. Predicting the effectiveness of queries and retrieval systems , 2010, SIGF.
[92] Clément de Groc,et al. Self-supervised Product Feature Extraction using a Knowledge Base and Visual Clues , 2012, WEBIST.
[93] George Cybenko,et al. Keeping up with the changing Web , 2000, Computer.
[94] Ben Choi,et al. Web Page Classification , 2005 .
[95] Thore Graepel,et al. Large Margin Rank Boundaries for Ordinal Regression , 2000 .
[96] Chih-Jen Lin,et al. LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..
[97] Jeremy T. Bradley,et al. PageRank: Splitting Homogeneous Singular Linear Systems of Index One , 2009, ICTIR.
[98] Hector Garcia-Molina,et al. Estimating frequency of change , 2003, TOIT.
[99] Yi Chang,et al. Yahoo! Learning to Rank Challenge Overview , 2010, Yahoo! Learning to Rank Challenge.
[100] Jiawei Han,et al. PEBL: Web page classification without negative examples , 2004, IEEE Transactions on Knowledge and Data Engineering.
[101] O Baujard,et al. MARVIN, multi-agent softbot to retrieve multilingual medical information on the Web. , 1998, Medical informatics = Medecine et informatique.
[102] Malik Yousef,et al. One-Class SVMs for Document Classification , 2002, J. Mach. Learn. Res..
[103] Soumen Chakrabarti. Interactive Focused Crawler : Setup , Monitoring and Control through User Feedback , 2003 .
[104] Eneko Agirre,et al. Personalizing PageRank for Word Sense Disambiguation , 2009, EACL.
[105] Patrick Gallinari,et al. Document structure meets page layout: loopy random fields for web news content extraction , 2010, DocEng '10.
[106] Christina Lioma,et al. Random walk term weighting for information retrieval , 2007, SIGIR.
[107] Emine Yilmaz,et al. Document selection methodologies for efficient and effective learning-to-rank , 2009, SIGIR.
[108] Fernando Diaz,et al. Learning to aggregate vertical results into web search results , 2011, CIKM '11.
[109] William P. Birmingham,et al. Improving category specific Web search by learning query modifications , 2001, Proceedings 2001 Symposium on Applications and the Internet.
[110] Clément de Groc. Babouk: Focused Web Crawling for Corpus Compilation and Automatic Terminology Extraction , 2011, 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology.
[111] Wei-Ying Ma,et al. Object-level ranking: bringing order to Web objects , 2005, WWW '05.
[112] D. Sculley,et al. Large Scale Learning to Rank , 2009 .
[113] B. Huberman,et al. The Deep Web : Surfacing Hidden Value , 2000 .
[114] Jean-Daniel Fekete,et al. Overlaying Graph Links on Treemaps , 2003 .
[115] Maged M. Michael,et al. Scale-up x Scale-out: A Case Study using Nutch/Lucene , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.
[116] Hector Garcia-Molina,et al. Parallel crawlers , 2002, WWW.
[117] Filippo Menczer,et al. Topical web crawlers: Evaluating adaptive algorithms , 2004, TOIT.
[118] Tao Qin,et al. LETOR: A benchmark collection for research on learning to rank for information retrieval , 2010, Information Retrieval.
[119] Adam Kilgarriff,et al. Introduction to the Special Issue on the Web as Corpus , 2003, CL.
[120] Kevin Duh,et al. Learning to rank with partially-labeled data , 2008, SIGIR '08.
[121] Matthew Richardson,et al. The Intelligent surfer: Probabilistic Combination of Link and Content Information in PageRank , 2001, NIPS.
[122] Hector Garcia-Molina,et al. Effective page refresh policies for Web crawlers , 2003, TODS.
[123] Eric Gaussier,et al. Une nouvelle approche à l'extraction de lexiques bilingues à partir de corpus comparables , 2007 .
[124] Xin Jiang,et al. A ranking approach to keyphrase extraction , 2009, SIGIR.
[125] William W. Cohen,et al. Contextual search and name disambiguation in email using graphs , 2006, SIGIR.
[126] Huaiyu Zhu. On Information and Sufficiency , 1997 .
[127] Alexander Dekhtyar,et al. Information Retrieval , 2018, Lecture Notes in Computer Science.
[128] Marc Najork,et al. Web Crawling , 2010, Found. Trends Inf. Retr..
[129] Jaana Kekäläinen,et al. Cumulated gain-based evaluation of IR techniques , 2002, TOIS.
[130] Reinier Post,et al. Information Retrieval in the World-Wide Web: Making Client-Based Searching Feasible , 1994, Comput. Networks ISDN Syst..
[131] Tie-Yan Liu,et al. Learning to rank for information retrieval , 2009, SIGIR.
[132] Andrei Z. Broder,et al. Identifying and Filtering Near-Duplicate Documents , 2000, CPM.
[133] Clément de Groc,et al. Un critère de cohésion thématique fondé sur un graphe de cooccurrences (Topical Cohesion using Graph Random Walks) [in French] , 2012, JEP-TALN-RECITAL.
[134] Eric Brill,et al. Beyond PageRank: machine learning for static ranking , 2006, WWW '06.
[135] Hector Garcia-Molina,et al. Web Spam Taxonomy , 2005, AIRWeb.
[136] Adam Rifkin,et al. Nutch: A Flexible and Scalable Open-Source Web Search Engine , 2005 .
[137] Andrei Z. Broder,et al. A Technique for Measuring the Relative Size and Overlap of Public Web Search Engines , 1998, Comput. Networks.
[138] Carlos Castillo,et al. Effective web crawling , 2005, SIGF.
[139] J. Friedman. Greedy function approximation: A gradient boosting machine. , 2001 .
[140] David Hawking,et al. Quality-Oriented Search for Depression Portals , 2009, ECIR.
[141] M. de Rijke,et al. Using Coherence-Based Measures to Predict Query Difficulty , 2008, ECIR.
[142] Padmini Srinivasan,et al. Learning to crawl: Comparing classification schemes , 2005, TOIS.
[143] Christopher D. Manning,et al. Random Walks for Text Semantic Similarity , 2009, Graph-based Methods for Natural Language Processing.
[144] Lee Gillam,et al. University of Surrey Participation in TREC8: Weirdness Indexing for Logical Document Extrapolation and Retrieval (WILDER) , 1999, TREC.
[145] Fredric C. Gey,et al. The Domain-Specific Task of CLEF - Specific Evaluation Strategies in Cross-Language Information Retrieval , 2000, CLEF.
[146] Filippo Menczer,et al. ARACHNID: Adaptive Retrieval Agents Choosing Heuristic Neighborhoods for Information Discovery , 1997, ICML 1997.
[147] Jan Pomikálek. Removing Boilerplate and Duplicate Content from Web Corpora , 2011 .
[148] Filippo Menczer,et al. MySpiders: Evolve Your Own Intelligent Web Crawlers , 2002, Autonomous Agents and Multi-Agent Systems.
[149] Adam Kilgarriff,et al. Cleaneval: a Competition for Cleaning Web Pages , 2008, LREC.