Web page classification: Features and algorithms
暂无分享,去创建一个
[1] Susan T. Dumais,et al. Hierarchical classification of Web content , 2000, SIGIR '00.
[2] M. Indra Devi,et al. Feature Selection for Web Page Classification , 2009 .
[3] Min-Yen Kan. Web page classification without the web page , 2004, WWW Alt. '04.
[4] Ulf Hermjakob,et al. Parsing and Question Classification for Question Answering , 2001, ACL 2001.
[5] Yiming Yang,et al. An experimental study on large-scale web categorization , 2005, WWW '05.
[6] Fabrizio Silvestri,et al. Know your neighbors: web spam detection using the web topology , 2007, SIGIR.
[7] Evgeniy Gabrilovich,et al. Harnessing the Expertise of 70, 000 Human Editors: Knowledge-Based Feature Generation for Text Categorization , 2007, J. Mach. Learn. Res..
[8] Xiaogang Peng,et al. Automatic web page classification in a dynamic and hierarchical way , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..
[9] Brian D. Davison. Topical locality in the Web , 2000, SIGIR '00.
[10] William W. Cohen. Improving a Page Classifier with Anchor Extraction and Link Analysis , 2002, NIPS.
[11] Dunja Mladenic,et al. Turning Yahoo to Automatic Web-Page Classifier , 1998, European Conference on Artificial Intelligence.
[12] Michael McGill,et al. Introduction to Modern Information Retrieval , 1983 .
[13] Thorsten Joachims,et al. Web Watcher: A Tour Guide for the World Wide Web , 1997, IJCAI.
[14] Nello Cristianini,et al. Composite Kernels for Hypertext Categorisation , 2001, ICML.
[15] Azriel Rosenfeld,et al. Scene Labeling by Relaxation Operations , 1976, IEEE Transactions on Systems, Man, and Cybernetics.
[16] Martin van den Berg,et al. Focused Crawling: A New Approach to Topic-Specific Web Resource Discovery , 1999, Comput. Networks.
[17] Osmar R. Zaïane,et al. Finding Similar Queries to Satisfy Searches Based on Query Traces , 2002, OOIS Workshops.
[18] David A. Cohn,et al. The Missing Link - A Probabilistic Model of Document Content and Hypertext Connectivity , 2000, NIPS.
[19] Filippo Menczer,et al. Algorithmic detection of semantic similarity , 2005, WWW '05.
[20] Chaomei Chen,et al. Mining the Web: Discovering knowledge from hypertext data , 2004, J. Assoc. Inf. Sci. Technol..
[21] Tong Zhang,et al. Linear prediction models with graph regularization for web-page categorization , 2006, KDD '06.
[22] Dell Zhang,et al. Question classification using support vector machines , 2003, SIGIR.
[23] Hong Qu,et al. Automated Blog Classification: Challenges and Pitfalls , 2006, AAAI Spring Symposium: Computational Approaches to Analyzing Weblogs.
[24] Rohini K. Srihari,et al. Using Verbs and Adjectives to Automatically Classify Blog Sentiment , 2006, AAAI Spring Symposium: Computational Approaches to Analyzing Weblogs.
[25] T. Landauer,et al. Indexing by Latent Semantic Analysis , 1990 .
[26] Gerhard Weikum,et al. Query-Log Based Authority Analysis for Web Information Search , 2004, WISE.
[27] Ronen Feldman,et al. The Data Mining and Knowledge Discovery Handbook , 2005 .
[28] Oren Etzioni,et al. Scaling question answering to the Web , 2001, WWW '01.
[29] Brian D. Davison,et al. Knowing a web page by the company it keeps , 2006, CIKM '06.
[30] Joseph Kaye,et al. Understanding how bloggers feel: recognizing affect in blog posts , 2006, CHI Extended Abstracts.
[31] Tom M. Mitchell,et al. Discovering Test Set Regularities in Relational Domains , 2000, ICML.
[32] Filippo Menczer,et al. Mapping the semantics of Web text and links , 2005, IEEE Internet Computing.
[33] Grace Hui Yang,et al. Web-based List Question Answering , 2004, COLING.
[34] Scott Nowson. The Language of Weblogs: A study of genre and individual differences , 2006 .
[35] Javed Mostafa,et al. An application of text categorization methods to gene ontology annotation , 2005, SIGIR '05.
[36] T. Joachims. WebWatcher : A Tour Guide for the World Wide Web , 1997 .
[37] Ee-Peng Lim,et al. Hierarchical text classification and evaluation , 2001, Proceedings 2001 IEEE International Conference on Data Mining.
[38] Foster J. Provost,et al. Classification in Networked Data: a Toolkit and a Univariate Case Study , 2007, J. Mach. Learn. Res..
[39] Haym Hirsh,et al. Using LSI for text classification in the presence of background text , 2001, CIKM '01.
[40] Ee-Peng Lim,et al. Web classification using support vector machine , 2002, WIDM '02.
[41] Evgeniy Gabrilovich,et al. Overcoming the Brittleness Bottleneck using Wikipedia: Enhancing Text Categorization with Encyclopedic Knowledge , 2006, AAAI.
[42] Hector Garcia-Molina,et al. Web Spam Taxonomy , 2005, AIRWeb.
[43] Richard M. Everson,et al. When Are Links Useful? Experiments in Text Classification , 2003, ECIR.
[44] Tom M. Mitchell,et al. Learning to Extract Symbolic Knowledge from the World Wide Web , 1998, AAAI/IAAI.
[45] Zenglin Xu,et al. Web page classification with heterogeneous data fusion , 2007, WWW '07.
[46] Hugo Liu,et al. A Corpus-based Approach to Finding Happiness , 2006, AAAI Spring Symposium: Computational Approaches to Analyzing Weblogs.
[47] Bettina Berendt,et al. Tags are not metadata, but "just more content" - to some people , 2007, ICWSM.
[48] David H. Wolpert,et al. Stacked generalization , 1992, Neural Networks.
[49] Doug Beeferman,et al. Agglomerative clustering of a search engine query log , 2000, KDD '00.
[50] Berthier A. Ribeiro-Neto,et al. Combining link-based and content-based methods for web document classification , 2003, CIKM '03.
[51] Csaba Veres,et al. The Language of Folksonomies: What Tags Reveal About User Classification , 2006, NLDB.
[52] William R. Hersh. Text retrieval conference (TREC) genomics pre-track workshop , 2002, JCDL '02.
[53] Wolfgang Nejdl,et al. Utility analysis for topically biased PageRank , 2007, WWW '07.
[54] 共立出版株式会社. コンピュータ・サイエンス : ACM computing surveys , 1978 .
[55] Johannes Fürnkranz,et al. Link-Local Features for Hypertext Classification , 2005, EWMF/KDO.
[56] Thomas Hofmann,et al. Probabilistic latent semantic indexing , 1999, SIGIR '99.
[57] Larry Fitzpatrick,et al. Automatic feedback using past queries: social searching? , 1997, SIGIR '97.
[58] Hector Garcia-Molina,et al. Link Spam Alliances , 2005, VLDB.
[59] Gilad Mishne,et al. Capturing Global Mood Levels using Blog Posts , 2006, AAAI Spring Symposium: Computational Approaches to Analyzing Weblogs.
[60] Eli Upfal,et al. Web search using automatic classification , 1996, WWW 1996.
[61] Oren Kurland,et al. PageRank without hyperlinks: structural re-ranking using links induced by language models , 2005, SIGIR '05.
[62] Mounia Lalmas,et al. A probabilistic description-oriented approach for categorizing web documents , 1999, CIKM '99.
[63] Rayid Ghani,et al. Combining labeled and unlabeled data for text classification with a large number of categories , 2001, Proceedings 2001 IEEE International Conference on Data Mining.
[64] Pedro M. Domingos,et al. Learning to map between ontologies on the semantic web , 2002, WWW '02.
[65] Rajeev Motwani,et al. The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.
[66] Mika Käki,et al. Findex: search result categories help users when document ranking fails , 2005, CHI.
[67] Sung-Hyon Myaeng,et al. A practical hypertext catergorization method using links and incrementally available class information , 2000, SIGIR '00.
[68] Aljoscha Klose. Extracting fuzzy classification rules from partially labeled data , 2004, Soft Comput..
[69] Einat Amitay,et al. Using common hypertext links to identify the best phrasal description of target web documents , 1998 .
[70] Ji-Rong Wen,et al. Query clustering using user logs , 2002, TOIS.
[71] Ben Choi,et al. Web Page Classification , 2005 .
[72] Natalie S. Glance,et al. Community search assistant , 2001, IUI '01.
[73] Soumen Chakrabarti,et al. Data mining for hypertext: a tutorial survey , 2000, SKDD.
[74] Gregory N. Hullender,et al. Learning to rank using gradient descent , 2005, ICML.
[75] Christoph Lindemann,et al. Coarse-grained classification of web sites by their structural properties , 2006, WIDM '06.
[76] Filip Radlinski,et al. Query chains: learning to rank from implicit feedback , 2005, KDD '05.
[77] Hans-Peter Kriegel,et al. Web site mining: a new way to spot competitors, customers and suppliers in the world wide web , 2002, KDD.
[78] David M. Pennock,et al. The structure of broad topics on the web , 2002, WWW.
[79] Siegfried Handschuh,et al. P-TAG: large scale automatic generation of personalized annotation tags for the web , 2007, WWW '07.
[80] Qiang Yang,et al. A comparison of implicit and explicit links for web page classification , 2006, WWW '06.
[81] Wei-Ying Ma,et al. Web-page classification through summarization , 2004, SIGIR '04.
[82] Thomas G. Dietterich. What is machine learning? , 2020, Archives of Disease in Childhood.
[83] Amit P. Sheth,et al. Altering document term vectors for classification: ontologies as expectations of co-occurrence , 2007, WWW '07.
[84] Tie-Yan Liu,et al. Adapting ranking SVM to document retrieval , 2006, SIGIR.
[85] Arlindo L. Oliveira,et al. An Empirical Comparison of Text Categorization Methods , 2003, SPIRE.
[86] Hugh E. Williams,et al. Strategies for minimising errors in hierarchical web categorisation , 2002, CIKM '02.
[87] Jong-Hyeok Lee,et al. Text categorization based on k-nearest neighbor approach for Web site classification , 2003, Inf. Process. Manag..
[88] Weiming Hu,et al. A Novel Web Page Filtering System by Combining Texts and Images , 2006, 2006 IEEE/WIC/ACM International Conference on Web Intelligence (WI 2006 Main Conference Proceedings)(WI'06).
[89] Songbo Tan,et al. Combining error-correcting output codes and model-refinement for text categorization , 2007, SIGIR.
[90] Liming Chen,et al. WebGuard: Web based adult content detection and filtering system , 2003, Proceedings IEEE/WIC International Conference on Web Intelligence (WI 2003).
[91] Giuseppe Attardi,et al. Automatic Web Page Categorization by Link and Context Analysis , 1999 .
[92] Jaideep Srivastava,et al. Web Mining , 2004, Data Mining and Knowledge Discovery.
[93] Arul Prakash Asirvatham,et al. Web Page Classification based on Document Structure , 2001 .
[94] Vincenzo Loia,et al. Personalized Knowledge Models Using RDF-Based Fuzzy Classification , 2006, Soft Computing in Web Information Retrieva.
[95] Yiming Yang,et al. A Study of Approaches to Hypertext Categorization , 2002, Journal of Intelligent Information Systems.
[96] Jiawei Han,et al. PEBL: Web page classification without negative examples , 2004, IEEE Transactions on Knowledge and Data Engineering.
[97] Richard O. Duda,et al. Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.
[98] Steffen Bickel,et al. Discovering Communities in Linked Data by Multi-view Clustering , 2005, GfKl.
[99] Hongyuan Zha,et al. Web document clustering using hyperlink structures , 2001 .
[100] Aixin Sun,et al. Blog Classification Using Tags: An Empirical Study , 2007, ICADL.
[101] Alan L. Rector,et al. Web ontology segmentation: analysis, classification and use , 2006, WWW '06.
[102] Dunja Mladenic,et al. Text-learning and related intelligent agents: a survey , 1999, IEEE Intell. Syst..
[103] Johannes Fürnkranz,et al. Hyperlink ensembles: a case study in hypertext classification , 2002, Inf. Fusion.
[104] Susan T. Dumais,et al. The Combination of Text Classifiers Using Reliability Indicators , 2016, Information Retrieval.
[105] Yihong Gong,et al. Combining content and link for classification using matrix factorization , 2007, SIGIR.
[106] Fabrizio Sebastiani,et al. Machine learning in automated text categorization , 2001, CSUR.
[107] Evgeniy Gabrilovich,et al. Text categorization with many redundant features: using aggressive feature selection to make SVMs competitive with C4.5 , 2004, ICML.
[108] Oren Kurland,et al. Respect my authority!: HITS without hyperlinks, utilizing cluster-based language models , 2006, SIGIR.
[109] Wei-Ying Ma,et al. OCFS: optimal orthogonal centroid feature selection for text categorization , 2005, SIGIR '05.
[110] Jong-Hyeok Lee,et al. Web page classification based on k-nearest neighbor approach , 2000, IRAL '00.
[111] David Carmel,et al. The connectivity sonar: detecting site functionality by structural patterns , 2003, HYPERTEXT '03.
[112] Hugh E. Williams,et al. Fast Categorisation of Large Document Collections , 2001, SPIRE.
[113] G. Mishne. Experiments with Mood Classification in , 2005 .
[114] David M. Pennock,et al. Using web structure for classifying and describing web pages , 2002, WWW.
[115] Shui-Lung Chuang,et al. Liveclassifier: creating hierarchical text classifiers through web corpora , 2004, WWW '04.
[116] Andrei Z. Broder,et al. A semantic approach to contextual advertising , 2007, SIGIR.
[117] Taher H. Haveliwala. Topic-sensitive PageRank , 2002, IEEE Trans. Knowl. Data Eng..
[118] Piotr Indyk,et al. Enhanced hypertext categorization using hyperlinks , 1998, SIGMOD '98.
[119] Lise Getoor,et al. Link mining: a survey , 2005, SKDD.
[120] Avrim Blum,et al. The Bottleneck , 2021, Monopsony Capitalism.
[121] Shivani Agarwal,et al. Ranking on graph data , 2006, ICML.
[122] Thomas G. Dietterich,et al. Solving Multiclass Learning Problems via Error-Correcting Output Codes , 1994, J. Artif. Intell. Res..
[123] Sanda M. Harabagiu,et al. Experiments with Open-Domain Textual Question Answering , 2000, COLING.
[124] Taher H. Haveliwala. Topic-Sensitive PageRank: A Context-Sensitive Ranking Algorithm for Web Search , 2003, IEEE Trans. Knowl. Data Eng..
[125] Koraljka Golub,et al. Importance of HTML structural elements and metadata in automated subject classification , 2005 .
[126] Rayid Ghani,et al. Combining Labeled and Unlabeled Data for MultiClass Text Categorization , 2002, ICML.
[127] Yugyung Lee,et al. OntoKhoj: a semantic web portal for ontology searching, ranking and classification , 2003, WIDM '03.
[128] Michael J. Pazzani,et al. Syskill & Webert: Identifying Interesting Web Sites , 1996, AAAI/IAAI, Vol. 1.
[129] Andrei Z. Broder,et al. Robust classification of rare queries using web knowledge , 2007, SIGIR.
[130] Hendrik Blockeel,et al. Web mining research: a survey , 2000, SKDD.
[131] Yiming Yang,et al. Support vector machines classification with a very large-scale taxonomy , 2005, SKDD.
[132] Ludmila I. Kuncheva,et al. Combining Pattern Classifiers: Methods and Algorithms , 2004 .
[133] Yasuhiro Suzuki,et al. Automatically collecting, monitoring, and mining japanese weblogs , 2004, WWW Alt. '04.
[134] Johannes Fürnkranz,et al. Web Mining , 2005, Data Mining and Knowledge Discovery Handbook.
[135] Evgeniy Gabrilovich,et al. Parameterized generation of labeled datasets for text categorization based on a hierarchical directory , 2004, SIGIR '04.
[136] Shui-Lung Chuang,et al. Using a web-based categorization approach to generate thematic metadata from texts , 2004, TALIP.
[137] Jennifer Neville,et al. Why collective inference improves relational classification , 2004, KDD.
[138] Daphne Koller,et al. Support Vector Machine Active Learning with Applications to Text Classification , 2000, J. Mach. Learn. Res..
[139] Qiang Yang,et al. Reinforcing Web-object Categorization Through Interrelationships , 2006, Data Mining and Knowledge Discovery.
[140] Subhash C. Bagui,et al. Combining Pattern Classifiers: Methods and Algorithms , 2005, Technometrics.
[141] Ah-Hwee Tan,et al. Text Mining: The state of the art and the challenges , 2000 .
[142] Hugh E. Williams,et al. Simple and accurate feature selection for hierarchical categorisation , 2002, DocEng '02.
[143] Wei Liu,et al. Importance-Based Web Page Classification Using Cost-Sensitive SVM , 2005, WAIM.
[144] John M. Pierre,et al. On the Automated Classification of Web Sites , 2001, ArXiv.
[145] Dunja Mladenic,et al. Turning {{\sc Yahoo!}}\ into an automatic Web page classifier , 1998 .
[146] Weiguo Fan,et al. Discretization based learning approach to information retrieval , 2005, EMNLP 2005.
[147] Byoung-Tak Zhang,et al. Large Scale Unstructured Document Classification Using Unlabeled Data and Syntactic Information , 2003, PAKDD.
[148] Lise Getoor,et al. Link-Based Classification , 2003, Encyclopedia of Machine Learning and Data Mining.
[149] Brian D. Davison,et al. Topical link analysis for web search , 2006, SIGIR.
[150] Eric Brill,et al. Beyond PageRank: machine learning for static ranking , 2006, WWW '06.
[151] Fabrizio Sebastiani,et al. A Tutorial on Automated Text Categorisation , 2000 .
[152] Vaughan R. Shanks,et al. Fast categorisation of large document collections , 2001, Proceedings Eighth Symposium on String Processing and Information Retrieval.
[153] Lise Getoor,et al. Link-Based Classification , 2003, Encyclopedia of Machine Learning and Data Mining.
[154] Stefan Siersdorfer,et al. A neighborhood-based approach for clustering of linked document collections , 2006, CIKM '06.
[155] Maarten de Rijke,et al. Learning to Recognize Blogs: A Preliminary Exploration , 2006 .
[156] Thomas Hofmann,et al. Probabilistic Latent Semantic Analysis , 1999, UAI.
[157] Yiming Yang,et al. A scalability analysis of classifiers in text categorization , 2003, SIGIR.
[158] Kjersti Aas,et al. Text Categorisation: A Survey , 1999 .
[159] Susan T. Dumais,et al. Bringing order to the Web: automatically categorizing search results , 2000, CHI.
[160] Min-Yen Kan,et al. Fast webpage classification using URL features , 2005, CIKM '05.
[161] Johannes Fürnkranz,et al. Exploiting Structural Information for Text Classification on the WWW , 1999, IDA.
[162] Wen Gao,et al. Two-phase Web site classification based on hidden Markov tree models , 2003, Proceedings IEEE/WIC International Conference on Web Intelligence (WI 2003).
[163] Andreas Hotho,et al. Tag Recommendations in Folksonomies , 2007, LWA.
[164] Yiming Yang,et al. A Comparative Study on Feature Selection in Text Categorization , 1997, ICML.
[165] Grace Hui Yang,et al. Effectiveness of web page classification on finding list answers , 2004, SIGIR '04.
[166] Veljko Milutinovic,et al. Visual Adjacency Multigraphs – a Novel Approach for a Web Page Classification , 2004 .
[167] Thorsten Joachims,et al. Optimizing search engines using clickthrough data , 2002, KDD.
[168] Svetlana Kiritchenko,et al. Hierarchical text categorization and its application to bioinformatics , 2006 .
[169] Gerhard Weikum,et al. Graph-based text classification: learn from your neighbors , 2006, SIGIR.
[170] Yihong Gong,et al. Multi-labelled classification using maximum entropy method , 2005, SIGIR '05.
[171] Evgeniy Gabrilovich,et al. Feature Generation for Text Categorization Using World Knowledge , 2005, IJCAI.
[172] Thorsten Joachims,et al. WebWatcher : A Learning Apprentice for the World Wide Web , 1995 .
[173] Brian D. Davison. The potential of the metasearch engine , 2005, ASIST.
[174] Benno Stein,et al. Genre Classification of Web Pages , 2004, KI.
[175] Yiming Yang,et al. Hypertext Categorization using Hyperlink Patterns and Meta Data , 2001, ICML.
[176] Witold Pedrycz,et al. PROXIMITY-BASED SUPERVISION FOR FLEXIBLE WEB PAGES CATEGORIZATION , 2004 .