Web mining: Machine learning for web applications

L'A. examine la recherche sur l'apprentissage automatique et les techniques de recherche d'information traditionnelles, et leurs possibles applications pour les systemes de fouille sur le Web. La fouille du Web est entendue comme la decouverte et l'analyse d'information utile sur le Web, domaine d'etudes a la croisee de la recherche d'information, la recherche sur le Web, l'apprentissage automatique, les bases de donnees, la fouille de donnees et la fouille de texte. Les etudes sur la fouille du Web se repartissent en trois categories, selon que l'on considere le contenu du Web, sa structure ou son usage.

[1]  Philipp Slusallek,et al.  Introduction to real-time ray tracing , 2005, SIGGRAPH Courses.

[2]  B. Pinkerton,et al.  Finding What People Want : Experiences with the WebCrawler , 1994, WWW Spring 1994.

[3]  Donald H. Kraft,et al.  GENETIC ALGORITHMS FOR QUERY OPTIMIZATION IN INFORMATION RETRIEVAL: RELEVANCE FEEDBACK , 1997 .

[4]  Robert F. Cohen,et al.  WebOFDAV - Navigating and Visualizing the Web On-Line with Animated Context Swapping , 1998, Comput. Networks.

[5]  Kathleen Hemenway,et al.  Human nature and the glass ceiling in industry , 1995, CACM.

[6]  Richard P. Lippmann,et al.  An introduction to computing with neural nets , 1987 .

[7]  Hsinchun Chen Machine learning for information retrieval: neural networks, symbolic learning, and genetic algorithms , 1995 .

[8]  Pat Langley,et al.  An Analysis of Bayesian Classifiers , 1992, AAAI.

[9]  J. Leon Zhao,et al.  Automatic discovery of similarity relationships through Web mining , 2003, Decis. Support Syst..

[10]  Michael de la Maza,et al.  Book review: Genetic Algorithms + Data Structures = Evolution Programs by Zbigniew Michalewicz (Springer-Verlag, 1992) , 1993 .

[11]  Dik Lun Lee,et al.  Feature reduction for neural network based text categorization , 1999, Proceedings. 6th International Conference on Advanced Systems for Advanced Applications.

[12]  Gloria Bordogna,et al.  A user-adaptive indexing model of structured documents , 2001, 10th IEEE International Conference on Fuzzy Systems. (Cat. No.01CH37297).

[13]  Jon M. Kleinberg,et al.  Mining the Web's Link Structure , 1999, Computer.

[14]  José M. N. Leitão,et al.  Efficient Computation of , 2002 .

[15]  Richard M. Schwartz,et al.  BBN: Description of the SIFT System as Used for MUC-7 , 1998, MUC.

[16]  Gary Marchionini,et al.  A self-organizing semantic map for information retrieval , 1991, SIGIR '91.

[17]  Donna K. Harman,et al.  Overview of the Sixth Text REtrieval Conference (TREC-6) , 1997, Inf. Process. Manag..

[18]  Amanda Spink,et al.  Real life, real users, and real needs: a study and analysis of user queries on the web , 2000, Inf. Process. Manag..

[19]  Kenneth Ward Church,et al.  Using Suffix Arrays to Compute Term Frequency and Document Frequency for All Substrings in a Corpus , 2001, Computational Linguistics.

[20]  Hsinchun Chen,et al.  A smart itsy bitsy spider for the web , 1998 .

[21]  Ken Lang,et al.  NewsWeeder: Learning to Filter Netnews , 1995, ICML.

[22]  Pattie Maes,et al.  Agents that reduce work and information overload , 1994, CACM.

[23]  John A. Hartigan,et al.  Clustering Algorithms , 1975 .

[24]  Nello Cristianini,et al.  Composite Kernels for Hypertext Categorisation , 2001, ICML.

[25]  Samuel Kaski,et al.  Self organization of a massive document collection , 2000, IEEE Trans. Neural Networks Learn. Syst..

[26]  Oren Etzioni,et al.  Web document clustering , 1998, SIGIR 1998.

[27]  Jon Kleinberg,et al.  Authoritative sources in a hyperlinked environment , 1999, SODA '98.

[28]  Marc Najork,et al.  Breadth-First Search Crawling Yields High-Quality Pages , 2001 .

[29]  Timo Honkela,et al.  Websom for Textual Data Mining , 1999, Artificial Intelligence Review.

[30]  Marco Gori,et al.  Focused Crawling Using Context Graphs , 2000, VLDB.

[31]  Bradley N. Miller,et al.  GroupLens: applying collaborative filtering to Usenet news , 1997, CACM.

[32]  Yiming Yang,et al.  A re-examination of text categorization methods , 1999, SIGIR '99.

[33]  Andrew McCallum,et al.  Using Reinforcement Learning to Spider the Web Efficiently , 1999, ICML.

[34]  Brewster Kahle,et al.  Preserving the Internet , 1997 .

[35]  Emmanuel Frécon,et al.  WEBPATH-a three dimensional Web history , 1998, Proceedings IEEE Symposium on Information Visualization (Cat. No.98TB100258).

[36]  J. Ross Quinlan,et al.  Learning Efficient Classification Procedures and Their Application to Chess End Games , 1983 .

[37]  Vittorio Maniezzo,et al.  Genetic evolution of the topology and weight distribution of neural networks , 1994, IEEE Trans. Neural Networks.

[38]  Hsinchun Chen,et al.  Updateable PAT-Tree Approach to Chinese Key PhraseExtraction using Mutual Information: A Linguistic Foundation for Knowledge Management , 1999 .

[39]  Donald H. Kraft,et al.  Fuzzy Set Techniques in Information Retrieval , 1999 .

[40]  B. Efron Estimating the Error Rate of a Prediction Rule: Improvement on Cross-Validation , 1983 .

[41]  Marti A. Hearst Untangling Text Data Mining , 1999, ACL.

[42]  Jaideep Srivastava,et al.  Web usage mining: discovery and applications of usage patterns from Web data , 2000, SKDD.

[43]  Andrei Z. Broder,et al.  Graph structure in the Web , 2000, Comput. Networks.

[44]  Yalin Wang,et al.  A machine learning based approach for table detection on the web , 2002, WWW '02.

[45]  Russell Beale,et al.  Case study. Narcissus: visualising information , 1995, Proceedings of Visualization 1995 Conference.

[46]  L. Breiman,et al.  Submodel selection and evaluation in regression. The X-random case , 1992 .

[47]  Martin van den Berg,et al.  Focused Crawling: A New Approach to Topic-Specific Web Resource Discovery , 1999, Comput. Networks.

[48]  Hsinchun Chen,et al.  CI Spider: a tool for competitive intelligence on the Web , 2002, Decis. Support Syst..

[49]  Nancy Chinchor,et al.  Overview of MUC-7 , 1998, MUC.

[50]  Ralph Grishman,et al.  NYU: Description of the MENE Named Entity System as Used in MUC-7 , 1998, MUC.

[51]  Hsinchun Chen,et al.  Knowledge Management Systems: A Text Mining Perspective , 2001 .

[52]  Hendrik Blockeel,et al.  Web mining research: a survey , 2000, SKDD.

[53]  Rick Kazman,et al.  WebQuery: Searching and Visualizing the Web Through Connectivity , 1997, Comput. Networks.

[54]  Hsinchun Chen,et al.  A graph-based recommender system for digital library , 2002, JCDL '02.

[55]  Andreas Hotho,et al.  Towards Semantic Web Mining , 2002, SEMWEB.

[56]  Donald H. Kraft,et al.  Applying Genetic Algorithms to Information Retrieval Systems Via Relevance Feedback , 1995 .

[57]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[58]  Larry Press,et al.  The global diffusion of the Internet: patterns and problems , 1994, CACM.

[59]  Oren Etzioni,et al.  Web document clustering: a feasibility demonstration , 1998, SIGIR '98.

[60]  Chris H. Q. Ding,et al.  Web document clustering using hyperlink structures , 2001, Comput. Stat. Data Anal..

[61]  Chris Buckley,et al.  A probabilistic learning approach for document indexing , 1991, TOIS.

[62]  Avron Barr,et al.  The Handbook of Artificial Intelligence , 1982 .

[63]  Marti A. Hearst,et al.  Reexamining the cluster hypothesis: scatter/gather on retrieval results , 1996, SIGIR '96.

[64]  James L. McClelland,et al.  Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations , 1986 .

[65]  Gerard Salton,et al.  The SMART Retrieval System—Experiments in Automatic Document Processing , 1971 .

[66]  Oren Etzioni,et al.  Grouper: A Dynamic Clustering Interface to Web Search Results , 1999, Comput. Networks.

[67]  Kenneth Ward Church A Stochastic Parts Program and Noun Phrase Parser for Unrestricted Text , 1988, ANLP.

[68]  ChakrabartiSoumen Data mining for hypertext , 2000 .

[69]  Daphne Koller,et al.  Hierarchically Classifying Documents Using Very Few Words , 1997, ICML.

[70]  Ellen M. Vdorhees The cluster hypothesis revisited , 1985, SIGIR 1985.

[71]  Andreas Stafylopatis,et al.  A Fuzzy Rule-Based Agent for Web Retrieval-Filtering , 2001, Web Intelligence.

[72]  Michael G. Christel,et al.  Evaluating a digital video library web interface , 2002, JCDL '02.

[73]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[74]  Piotr Indyk,et al.  Enhanced hypertext categorization using hyperlinks , 1998, SIGMOD '98.

[75]  Jay F. Nunamaker,et al.  Verifying the Proximity and Size Hypothesis for Self-Organizing Maps , 2000, J. Manag. Inf. Syst..

[76]  Ahmad M. Ahmad Wasfi Collecting user access patterns for building user profiles and collaborative filtering , 1998, IUI '99.

[77]  S. T. Buckland,et al.  An Introduction to the Bootstrap. , 1994 .

[78]  James E. Pitkow,et al.  In Search of Reliable Usage Data on the WWW , 1997, Comput. Networks.

[79]  Jay F. Nunamaker,et al.  Verifying the proximity hypothesis for self-organizing maps , 1999, Proceedings of the 32nd Annual Hawaii International Conference on Systems Sciences. 1999. HICSS-32. Abstracts and CD-ROM of Full Papers.

[80]  Yiming Yang,et al.  A Study of Approaches to Hypertext Categorization , 2002, Journal of Intelligent Information Systems.

[81]  Maurice D. Mulvenna,et al.  Discovering Internet marketing intelligence through online analytical web usage mining , 1998, SGMD.

[82]  David L. Waltz,et al.  Classifying news stories using memory based reasoning , 1992, SIGIR '92.

[83]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[84]  Hsinchun Chen,et al.  Personalized and Focused Web Spiders , 2003 .

[85]  Norbert Fuhr,et al.  Probabilistic information retrieval as a combination of abstraction, inductive learning, and probabilistic assumptions , 1994, TOIS.

[86]  Herbert A. Simon,et al.  Applications of machine learning and rule induction , 1995, CACM.

[87]  Geoffrey E. Hinton,et al.  A general framework for parallel distributed processing , 1986 .

[88]  Vibhu O. Mittal,et al.  Applying Machine Learning for High‐Performance Named‐Entity Extraction , 2000, Comput. Intell..

[89]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[90]  Teuvo Kohonen,et al.  Self-Organizing Maps , 2010 .

[91]  Donald H. Kraft,et al.  An Integrated Approach to Information Retrieval with Fuzzy Clustering and Fuzzy Inferencing , 2000 .

[92]  Thorsten Joachims,et al.  WebWatcher : A Learning Apprentice for the World Wide Web , 1995 .

[93]  Ross Wilkinson,et al.  Using clustering and classification approaches in interactive retrieval , 2001, Inf. Process. Manag..

[94]  J. H. Ward Hierarchical Grouping to Optimize an Objective Function , 1963 .

[95]  Jay F. Nunamaker,et al.  Multidimensional scaling for group memory visualization , 1999, Decis. Support Syst..

[96]  Jaime G. Carbonell,et al.  An Overview of Machine Learning , 1983 .

[97]  Peter Bailey,et al.  Overview of the TREC-8 Web Track , 2000, TREC.

[98]  J. J. Rocchio,et al.  Relevance feedback in information retrieval , 1971 .

[99]  Hector Garcia-Molina,et al.  Efficient Crawling Through URL Ordering , 1998, Comput. Networks.

[100]  Barr and Feigenbaum Edward A. Avron,et al.  The Handbook of Artificial Intelligence , 1981 .

[101]  Liu Zhijing,et al.  Web mining research , 2003, Proceedings Fifth International Conference on Computational Intelligence and Multimedia Applications. ICCIMA 2003.

[102]  Amanda Spink,et al.  Searching the Web: the public and their queries , 2001 .

[103]  Kui-Lam Kwok A neural network for probabilistic information retrieval , 1989, SIGIR '89.

[104]  C. Lee Giles,et al.  Accessibility of information on the web , 1999, Nature.

[105]  David Cohn,et al.  Learning to Probabilistically Identify Authoritative Documents , 2000, ICML.

[106]  Hsinchun Chen,et al.  Personalized spiders for web search and analysis , 2001, JCDL '01.

[107]  Hsinchun Chen,et al.  A Machine Learning Approach to Inductive Query by Examples: An Experiment Using Relevance Feedback, ID3, Genetic Algorithms, and Simulated Annealing , 1998, J. Am. Soc. Inf. Sci..

[108]  Soumen Chakrabarti,et al.  Data mining for hypertext: a tutorial survey , 2000, SKDD.

[109]  Jay F. Nunamaker,et al.  A graphical, self-organizing approach to classifying electronic meeting output , 1997 .

[110]  Oren Etzioni,et al.  A scalable comparison-shopping agent for the World-Wide Web , 1997, AGENTS '97.

[111]  Hsinchun Chen,et al.  An algorithmic approach to concept exploration in a large knowledge network (automatic thesaurus consultation): symbolic branch-and-bound search vs. connectionist Hopfield net activation , 1995 .

[112]  Umeshwar Dayal,et al.  From User Access Patterns to Dynamic Hypertext Linking , 1996, Comput. Networks.

[113]  Philip Resnik,et al.  Mining the Web for Bilingual Text , 1999, ACL.

[114]  Amanda Spink,et al.  Selected results from a large study of Web searching: the Excite study , 2000, Inf. Res..

[115]  Edith Cohen,et al.  Improving end-to-end performance of the Web using server volumes and proxy filters , 1998, SIGCOMM '98.

[116]  Igor Kononenko,et al.  Inductive and Bayesian learning in medical diagnosis , 1993, Appl. Artif. Intell..

[117]  Takenobu Tokunaga,et al.  Cluster-based text categorization: a comparison of category search strategies , 1995, SIGIR '95.

[118]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[119]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[120]  Andrew McCallum,et al.  A Machine Learning Approach to Building Domain-Specific Search Engines , 1999, IJCAI.

[121]  Douglas H. Fisher,et al.  Knowledge Acquisition Via Incremental Conceptual Clustering , 1987, Machine Learning.

[122]  Monika Henzinger,et al.  Analysis of a very large web search engine query log , 1999, SIGF.

[123]  Jiawei Han,et al.  Data Mining for Web Intelligence , 2002, Computer.

[124]  Johannes Fürnkranz,et al.  Exploiting Structural Information for Text Classification on the WWW , 1999, IDA.

[125]  Sriram Raghavan,et al.  Searching the Web , 2001, ACM Trans. Internet Techn..

[126]  Yihong Gong,et al.  Lessons Learned from Building a Terabyte Digital Video Library , 1999, Computer.

[127]  Tran Cao Son,et al.  The semantic web: a brain for humankind , 2001 .

[128]  R. Kitchin,et al.  The Atlas of Cyberspace , 2001 .

[129]  Steven J. Plimpton,et al.  Massively parallel methods for engineering and science problems , 1994, CACM.

[130]  Peter Willett,et al.  Recent trends in hierarchic document clustering: A critical review , 1988, Inf. Process. Manag..

[131]  Lee-Feng Chien,et al.  PAT-tree-based adaptive keyphrase extraction for intelligent Chinese information retrieval , 1999, Inf. Process. Manag..

[132]  Shlomo Moran,et al.  SALSA: the stochastic approach for link-structure analysis , 2001, TOIS.

[133]  Ramana Rao,et al.  Visualizing large trees using the hyperbolic browser , 1996, CHI Conference Companion.

[134]  John R. Anderson,et al.  MACHINE LEARNING An Artificial Intelligence Approach , 2009 .

[135]  Gary Marchionini Co-evolution of user and organizational interfaces: A longitudinal case study of WWW dissemination of national statistics , 2002, J. Assoc. Inf. Sci. Technol..

[136]  Sung-Hyon Myaeng,et al.  A practical hypertext catergorization method using links and incrementally available class information , 2000, SIGIR '00.

[137]  Sally Jo Cunningham,et al.  Applications of machine learning in information retrieval , 1999 .

[138]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[139]  J J Hopfield,et al.  Neural networks and physical systems with emergent collective computational abilities. , 1982, Proceedings of the National Academy of Sciences of the United States of America.

[140]  Hsinchun Chen,et al.  Intelligent internet searching agent based on hybrid simulated annealing , 2000, Decis. Support Syst..

[141]  James A. Hendler,et al.  The Semantic Web" in Scientific American , 2001 .

[142]  Matthew Hurst,et al.  Layout and Language: Challenges for Table Understanding on the Web , 2001 .

[143]  Michael D. Cooper,et al.  Using clustering techniques to detect usage patterns in a Web-based information system , 2001, J. Assoc. Inf. Sci. Technol..

[144]  Hwee Tou Ng,et al.  Feature selection, perceptron learning, and a usability case study for text categorization , 1997, SIGIR '97.

[145]  Yoav Shoham,et al.  Fab: content-based, collaborative recommendation , 1997, CACM.

[146]  Peter Edwards,et al.  Using Machine Learning to Enhance Software Tools for Internet Information Management , 1996 .

[147]  Frederick Hayes-Roth,et al.  The state of knowledge-based systems , 1994, CACM.

[148]  Alex Alves Freitas,et al.  Discovering Fuzzy Classification Rules with Genetic Programming and Co-evolution , 2001, PKDD.

[149]  Z. Z. Nick,et al.  Web search using a genetic algorithm , 2001 .

[150]  Anil K. Jain,et al.  Bootstrap Techniques for Error Estimation , 1987, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[151]  Hsinchun Chen,et al.  Internet Categorization and Search: A Self-Organizing Approach , 1996, J. Vis. Commun. Image Represent..

[152]  Oren Etzioni,et al.  The World-Wide Web: quagmire or gold mine? , 1996, CACM.

[153]  Henry Lieberman,et al.  Letizia: An Agent That Assists Web Browsing , 1995, IJCAI.

[154]  Douglas B. Terry,et al.  Using collaborative filtering to weave an information tapestry , 1992, CACM.

[155]  Fah-Chun Cheong Internet Agents: Spiders, Wanderers, Brokers, and 'Bots , 1996 .

[156]  Gerald Salton,et al.  Automatic text processing , 1988 .

[157]  Neil C. Rowe Marie-4: A High-Recall, Self-Improving Web Crawler That Finds Images Using Captions , 2002, IEEE Intell. Syst..

[158]  David B. Fogel,et al.  An introduction to simulated evolutionary optimization , 1994, IEEE Trans. Neural Networks.

[159]  Zbigniew Michalewicz,et al.  Genetic Algorithms + Data Structures = Evolution Programs , 1996, Springer Berlin Heidelberg.

[160]  Gerhard Pass,et al.  Probabilistic reasoning and probabilistic neural networks , 1992, Int. J. Intell. Syst..

[161]  Jaideep Srivastava,et al.  Web mining: information and pattern discovery on the World Wide Web , 1997, Proceedings Ninth IEEE International Conference on Tools with Artificial Intelligence.

[162]  Lotfi A. Zadeh,et al.  Fuzzy Sets , 1996, Inf. Control..

[163]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[164]  Steffen Staab,et al.  Ontology Learning for the Semantic Web , 2002, IEEE Intell. Syst..

[165]  David Bainbridge,et al.  Usage of the MELDEX Digital Music Library , 2001 .

[166]  Manny Rayner,et al.  Quantitative Evaluation of Explanation-Based Learning as an Optimisation Tool for a Large-Scale Natural Language System , 1991, IJCAI.

[167]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[168]  Hsinchun Chen,et al.  MetaSpider: Meta-searching and categorization on the Web , 2001, J. Assoc. Inf. Sci. Technol..