Efficient estimation of the size of text deep web data source
暂无分享,去创建一个
[1] Petros Zerfos,et al. Downloading textual hidden web content through keyword queries , 2005, Proceedings of the 5th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL '05).
[2] Luis Gravano,et al. Probe, count, and classify: categorizing hidden web databases , 2001, SIGMOD '01.
[3] Sheng Wu,et al. Estimating collection size with logistic regression , 2007, SIGIR.
[4] Sofía N. Galicia-Haro,et al. Can We Correctly Estimate the Total Number of Pages in Google for a Specific Language? , 2003, CICLing.
[5] Chris Buckley,et al. OHSUMED: an interactive retrieval evaluation and new large test collection for research , 1994, SIGIR '94.
[6] Sriram Raghavan,et al. Crawling the Hidden Web , 2001, VLDB.
[7] David Hawking,et al. Evaluating sampling methods for uncooperative collections , 2007, SIGIR.
[8] Otis Gospodnetic,et al. Lucene in Action , 2004 .
[9] Milad Shokouhi,et al. Capturing collection size for distributed non-cooperative retrieval , 2006, SIGIR.
[10] Valter Crescenzi,et al. RoadRunner: Towards Automatic Data Extraction from Large Web Sites , 2001, VLDB.
[11] Ken Lang,et al. NewsWeeder: Learning to Filter Netnews , 1995, ICML.
[12] Antonio Gulli,et al. The indexable web is more than 11.5 billion pages , 2005, WWW '05.
[13] Juliana Freire,et al. Siphoning Hidden-Web Data through Keyword-Based Interfaces , 2010, J. Inf. Data Manag..
[14] Andrei Z. Broder,et al. Estimating corpus size via queries , 2006, CIKM '06.
[15] Andrei Z. Broder,et al. A Technique for Measuring the Relative Size and Overlap of Public Web Search Engines , 1998, Comput. Networks.
[16] Wei-Ying Ma,et al. Query Selection Techniques for Efficient Crawling of Structured Web Sources , 2006, 22nd International Conference on Data Engineering (ICDE'06).
[17] L. Holst. A UNIFIED APPROACH TO LIMIT THEOREMS FOR URN MODELS , 1979 .
[18] Michael L. Nelson,et al. Efficient, automatic web resource harvesting , 2006, WIDM '06.
[19] Ziv Bar-Yossef,et al. Random sampling from a search engine's index , 2006, WWW '06.
[20] Ling Liu,et al. Probe, cluster, and discover: focused extraction of QA-Pagelets from the deep Web , 2004, Proceedings. 20th International Conference on Data Engineering.
[21] Stephen E. Fienberg,et al. How Large Is the World Wide Web , 2004 .
[22] David W. Embley,et al. Extracting Data behind Web Forms , 2002, ER.
[23] B. Huberman,et al. The Deep Web : Surfacing Hidden Value , 2000 .
[24] A. Chao. Estimating the population size for capture-recapture data with unequal catchability. , 1987, Biometrics.
[25] James P. Callan,et al. Query-based sampling of text databases , 2001, TOIS.
[26] Shengli Wu,et al. Experiments with Document Archive Size Detection , 2003, ECIR.
[27] Sourav S. Bhowmick,et al. DEQUE: querying the deep web , 2005, Data Knowl. Eng..
[28] Bryan F. J. Manly,et al. Handbook of Capture-Recapture Analysis , 2010 .
[29] Paul Bourret. How to Estimate the Sizes of Domains , 1984, Inf. Process. Lett..
[30] H. S. Heaps,et al. Information retrieval, computational and theoretical aspects , 1978 .
[31] Ziv Bar-Yossef,et al. Efficient search engine measurements , 2007, WWW '07.