Title extraction from bodies of HTML documents and its application to web page retrieval
暂无分享,去创建一个
Shuming Shi | Ruihua Song | Hang Li | Guoping Hu | Yunhua Hu | Yunbo Cao | Guomao Xin | Ruihua Song | Yunhua Hu | Yunbo Cao | Shuming Shi | Guomao Xin | Guoping Hu | Hang Li
[1] Weiyi Meng,et al. Using the Structure of HTML Documents to Improve Retrieval , 1997, USENIX Symposium on Internet Technologies and Systems.
[2] Line Eikvil,et al. Information Extraction from World Wide Web - A Survey , 1999 .
[3] Craig A. Knoblock,et al. A hierarchical approach to wrapper induction , 1999, AGENTS '99.
[4] Andrew McCallum,et al. Information Extraction with HMMs and Shrinkage , 1999 .
[5] Andrew McCallum,et al. Information Extraction with HMM Structures Learned by Stochastic Optimization , 2000, AAAI/IAAI.
[6] Maarten de Rijke,et al. Wrapper Generation via Grammar Induction , 2000, ECML.
[7] Valter Crescenzi,et al. RoadRunner: Towards Automatic Data Extraction from Large Web Sites , 2001, VLDB.
[8] John Shawe-Taylor,et al. The Perceptron Algorithm with Uneven Margins , 2002, ICML.
[9] David Carmel,et al. Topic Distillation with Knowledge Agents , 2002, TREC.
[10] Yiqun Liu,et al. THU TREC 2002: Novelty Track Experiments , 2002, TREC.
[11] Proceedings of The Eleventh Text REtrieval Conference, TREC 2002, Gaithersburg, Maryland, USA, November 19-22, 2002 , 2002, TREC.
[12] Valter Crescenzi,et al. Wrapping-oriented classification of web pages , 2002, SAC '02.
[13] Kevyn Collins-Thompson,et al. Information Filtering, Novelty Detection, and Named-Page Finding , 2002, TREC.
[14] James P. Callan,et al. Combining Structural Information and the Use of Priors in Mixed Named-Page and Homepage Finding , 2003, TREC.
[15] Min Zhang,et al. DF or IDF? On the Use of HTML Primary Feature Fields for Web IR , 2003, WWW.
[16] Maurice Bruynooghe,et al. Information Extraction from Web Documents Based on Local Unranked Tree Automaton Inference , 2003, IJCAI.
[17] David Hawking,et al. Overview of the TREC 2003 Web Track , 2003, TREC.
[18] Timothy C. Craven. HTML Tags as Extraction Cues for Web Page Description Construction , 2003, Informing Sci. Int. J. an Emerg. Transdiscipl..
[19] James P. Callan,et al. Combining document representations for known-item search , 2003, SIGIR.
[20] Robert L. Grossman,et al. Mining data records in Web pages , 2003, KDD '03.
[21] T. Breuel. Information Extraction from HTML Documents by Structural Matching , 2003 .
[22] Wei-Ying Ma,et al. Learning block importance models for web pages , 2004, WWW '04.
[23] Stephen E. Robertson,et al. Simple BM25 extension to multiple weighted fields , 2004, CIKM '04.
[24] J. Scott Hawker,et al. SA_MetaMatch: relevant document discovery through document metadata and indexing , 2004, ACM-SE 42.
[25] Dayne Freitag,et al. Machine Learning for Information Extraction in Informal Domains , 2000, Machine Learning.
[26] Alberto H. F. Laender,et al. Automatic web news extraction using tree edit distance , 2004, WWW '04.
[27] Tao Qin,et al. Microsoft Research Asia at Web Track and Terabyte Track of TREC 2004 , 2004, TREC.
[28] Judith L. Klavans,et al. Columbia Newsblaster: Multilingual News Summarization on the Web , 2004, NAACL.