Web page title extraction and its application
暂无分享,去创建一个
Shuming Shi | Ruihua Song | Hang Li | Chin-Yew Lin | Yunbo Cao | Yunhua Hu | Guomao Xin | Yewei Xue | Chin-Yew Lin | Hang Li | Ruihua Song | Yunhua Hu | Yunbo Cao | Shuming Shi | Guomao Xin | Yewei Xue
[1] Jihoon Yang,et al. Knowledge-based metadata extraction from PostScript files , 2000, DL '00.
[2] James P. Callan,et al. Combining Structural Information and the Use of Priors in Mixed Named-Page and Homepage Finding , 2003, TREC.
[3] Stephen E. Robertson,et al. Simple BM25 extension to multiple weighted fields , 2004, CIKM '04.
[4] Andrew McCallum,et al. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.
[5] Shuming Shi,et al. Title extraction from bodies of HTML documents and its application to web page retrieval , 2005, SIGIR '05.
[6] Maurice Bruynooghe,et al. Information Extraction from Web Documents Based on Local Unranked Tree Automaton Inference , 2003, IJCAI.
[7] Tao Qin,et al. Microsoft Research Asia at Web Track and Terabyte Track of TREC 2004 , 2004, TREC.
[8] W. Bruce Croft,et al. Table extraction using conditional random fields , 2003, DG.O.
[9] Robert L. Grossman,et al. Mining data records in Web pages , 2003, KDD '03.
[10] Timothy C. Craven. HTML Tags as Extraction Cues for Web Page Description Construction , 2003, Informing Sci. Int. J. an Emerg. Transdiscipl..
[11] Xinxin Wang,et al. Tabular Abstraction, Editing, and Formatting , 1996 .
[12] Edward A. Fox,et al. Automatic document metadata extraction using support vector machines , 2003, 2003 Joint Conference on Digital Libraries, 2003. Proceedings..
[13] Line Eikvil,et al. Information Extraction from World Wide Web - A Survey , 1999 .
[14] Alberto H. F. Laender,et al. Automatic web news extraction using tree edit distance , 2004, WWW '04.
[15] Kevyn Collins-Thompson,et al. Information Filtering, Novelty Detection, and Named-Page Finding , 2002, TREC.
[16] Abdel Belaïd. Recognition of table of contents for electronic library consulting , 2001, International Journal on Document Analysis and Recognition.
[17] Yiqun Liu,et al. THU TREC 2002: Novelty Track Experiments , 2002, TREC.
[18] Valter Crescenzi,et al. Wrapping-oriented classification of web pages , 2002, SAC '02.
[19] Chia-Hui Chang,et al. IEPAD: information extraction based on pattern discovery , 2001, WWW '01.
[20] Dayne Freitag,et al. Machine Learning for Information Extraction in Informal Domains , 2000, Machine Learning.
[21] J. Scott Hawker,et al. SA_MetaMatch: relevant document discovery through document metadata and indexing , 2004, ACM-SE 42.
[22] David Hawking,et al. Overview of the TREC 2003 Web Track , 2003, TREC.
[23] Andrew McCallum,et al. Information Extraction with HMMs and Shrinkage , 1999 .
[24] James P. Callan,et al. Combining document representations for known-item search , 2003, SIGIR.
[25] Judith L. Klavans,et al. Columbia Newsblaster: Multilingual News Summarization on the Web , 2004, NAACL.
[26] David Carmel,et al. Topic Distillation with Knowledge Agents , 2002, TREC.
[27] John Shawe-Taylor,et al. The Perceptron Algorithm with Uneven Margins , 2002, ICML.
[28] Thorsten Joachims,et al. A statistical learning learning model of text classification for support vector machines , 2001, SIGIR '01.
[29] Wei-Ying Ma,et al. VIPS: a Vision-based Page Segmentation Algorithm , 2003 .
[30] T. Breuel. Information Extraction from HTML Documents by Structural Matching , 2003 .
[31] Valter Crescenzi,et al. RoadRunner: Towards Automatic Data Extraction from Large Web Sites , 2001, VLDB.
[32] Thorsten Joachims,et al. A Statistical Learning Model of Text Classification for Support Vector Machines. , 2001, SIGIR 2002.
[33] Min Zhang,et al. DF or IDF? On the Use of HTML Primary Feature Fields for Web IR , 2003, WWW.
[34] Bidyut Baran Chaudhuri,et al. Extraction of type style-based meta-information from imaged documents , 2001, International Journal on Document Analysis and Recognition.
[35] Maarten de Rijke,et al. Wrapper Generation via Grammar Induction , 2000, ECML.
[36] Weiyi Meng,et al. Using the Structure of HTML Documents to Improve Retrieval , 1997, USENIX Symposium on Internet Technologies and Systems.
[37] David Hawking,et al. Overview of the TREC 2004 Web Track , 2004, TREC.
[38] Wei-Ying Ma,et al. Learning block importance models for web pages , 2004, WWW '04.
[39] Craig A. Knoblock,et al. A hierarchical approach to wrapper induction , 1999, AGENTS '99.