Exploiting content redundancy for web information extraction
暂无分享,去创建一个
Rajeev Rastogi | Pankaj Gulhane | Srinivasan H. Sengamedu | Ashwin Tengli | R. Rastogi | P. Gulhane | Ashwin Tengli
[1] Matthew Richardson,et al. Markov logic networks , 2006, Machine Learning.
[2] Sergey Brin,et al. Extracting Patterns and Relations from the World Wide Web , 1998, WebDB.
[3] Yida Wang,et al. Incorporating site-level knowledge to extract structured data from web forums , 2009, WWW '09.
[4] William W. Cohen. Integration of heterogeneous databases without common domains using queries based on textual similarity , 1998, SIGMOD '98.
[5] Daniel P. Lopresti,et al. Block Edit Models for Approximate String Matching , 1997, Theor. Comput. Sci..
[6] Raymond J. Mooney,et al. Adaptive duplicate detection using learnable string similarity measures , 2003, KDD '03.
[7] Wei-Ying Ma,et al. Simultaneous record detection and attribute labeling in web data extraction , 2006, KDD '06.
[8] Surajit Chaudhuri,et al. A Primitive Operator for Similarity Joins in Data Cleaning , 2006, 22nd International Conference on Data Engineering (ICDE'06).
[9] Louise E. Moser,et al. Extracting data records from the web using tag path clustering , 2009, WWW '09.
[10] Nicholas Kushmerick,et al. Wrapper Induction for Information Extraction , 1997, IJCAI.
[11] Rajeev Motwani,et al. Robust and efficient fuzzy match for online data cleaning , 2003, SIGMOD '03.
[12] Rakesh Agarwal,et al. Fast Algorithms for Mining Association Rules , 1994, VLDB 1994.
[13] Christopher D. Manning,et al. Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..
[14] Eugene Agichtein,et al. Mining reference tables for automatic text segmentation , 2004, KDD.
[15] Andrew Tomkins,et al. The volume and evolution of web page templates , 2005, WWW '05.
[16] Valter Crescenzi,et al. RoadRunner: Towards Automatic Data Extraction from Large Web Sites , 2001, VLDB.
[17] Anuradha Bhamidipaty,et al. Interactive deduplication using active learning , 2002, KDD.
[18] Luis Gravano,et al. Text joins in an RDBMS for web data integration , 2003, WWW '03.
[19] Sunita Sarawagi,et al. Automatic segmentation of text into structured records , 2001, SIGMOD '01.
[20] Craig A. Knoblock,et al. Hierarchical Wrapper Induction for Semistructured Information Sources , 2004, Autonomous Agents and Multi-Agent Systems.
[21] Luis Gravano,et al. Snowball: extracting relations from large plain-text collections , 2000, DL '00.
[22] Bing Liu,et al. Web data extraction based on partial tree alignment , 2005, WWW '05.
[23] Ahmed K. Elmagarmid,et al. Duplicate Record Detection: A Survey , 2007, IEEE Transactions on Knowledge and Data Engineering.
[24] W. A. Beyer,et al. Some Biological Sequence Metrics , 1976 .
[25] Divesh Srivastava,et al. Record linkage: similarity measures and algorithms , 2006, SIGMOD Conference.