Advanced Techniques in Web Data Pre-processing and Cleaning
暂无分享,去创建一个
Pablo E. Román | Robert F. Dell | Juan D. Velásquez | J. Velásquez | P. Román | R. Dell | J. D. Velásquez
[1] Martin F. Porter,et al. An algorithm for suffix stripping , 1997, Program.
[2] James E. Pitkow,et al. Characterizing Browsing Strategies in the World-Wide Web , 1995, Comput. Networks ISDN Syst..
[3] James E. Pitkow,et al. Characterizing Browsing Behaviors on the World-Wide Web , 1995 .
[4] Saul Greenberg,et al. Revisitation patterns in World Wide Web navigation , 1997, CHI.
[5] Huberman,et al. Strong regularities in world wide web surfing , 1998, Science.
[6] Jon M. Kleinberg,et al. Mining the Web's Link Structure , 1999, Computer.
[7] David J. Hand,et al. Statistics and data mining: intersecting disciplines , 1999, SKDD.
[8] Charles Aulds. Linux Apache Web Server Administration , 2000 .
[9] Hector Garcia-Molina,et al. The Evolution of the Web and Implications for an Incremental Crawler , 2000, VLDB.
[10] Tao Luo,et al. Effective personalization based on association rule discovery from web usage data , 2001, WIDM '01.
[11] Mark Levene,et al. Zipf's Law for Web Surfers , 2001, Knowledge and Information Systems.
[12] Rayid Ghani,et al. Mining the web to create minority language corpora , 2001, CIKM '01.
[13] Sankar K. Pal,et al. Web mining in soft computing framework: relevance, state of the art and future directions , 2002, IEEE Trans. Neural Networks.
[14] Tatsunori Mori,et al. Information Gain Ratio as Term Weight: The case of Summarization of IR Results , 2002, COLING.
[15] Fabrizio Sebastiani,et al. Machine learning in automated text categorization , 2001, CSUR.
[16] Ravi Kumar,et al. Self-similarity in the web , 2001, TOIT.
[17] Robert E. Bixby,et al. Solving Real-World Linear Programs: A Decade and More of Progress , 2002, Oper. Res..
[18] Terumasa Aoki,et al. Using Self Organizing Feature Maps to Acquire Knowledge about Visitor Behavior in a Web Site , 2003, KES.
[19] Myra Spiliopoulou,et al. A Framework for the Evaluation of Session Reconstruction Heuristics in Web-Usage Analysis , 2003, INFORMS J. Comput..
[20] Marc Najork,et al. A large‐scale study of the evolution of Web pages , 2003, WWW '03.
[21] Jason J. Jung,et al. Semantic Outlier Analysis for Sessionizing Web Logs , 2003 .
[22] Chengqi Zhang,et al. Toward databases mining: Pre-processing collected data , 2003, Appl. Artif. Intell..
[23] Yuna Kim,et al. Web prefetching using display-based prediction , 2003, Proceedings IEEE/WIC International Conference on Web Intelligence (WI 2003).
[24] Javed I. Khan,et al. Exploiting Webspace organization for accelerating Web prefetching , 2003, Proceedings IEEE/WIC International Conference on Web Intelligence (WI 2003).
[25] Hector Garcia-Molina,et al. Estimating frequency of change , 2003, TOIT.
[26] D. Langford. Internet Ethics , 2003 .
[27] M. Tamer Özsu,et al. A Web page prediction model based on click-stream tree representation of user behavior , 2003, KDD '03.
[28] Lakhmi C. Jain,et al. Knowledge-Based Intelligent Information and Engineering Systems , 2004, Lecture Notes in Computer Science.
[29] Jason J. Jung. Ontology-Based Partitioning of Data Steam for Web Mining: A Case Study of Web Logs , 2004, International Conference on Computational Science.
[30] Jin Chen,et al. A Preprocessing Framework and Approach for Web Applications , 2004, J. Web Eng..
[31] J. Srivastava,et al. Mining Temporally Evolving Graphs , 2004 .
[32] Teuvo Kohonen,et al. Self-organized formation of topologically correct feature maps , 2004, Biological Cybernetics.
[33] Jaideep Srivastava,et al. Mining Temporally Changing Web Usage Graphs , 2004, WebKDD.
[34] Luis Gravano,et al. When one sample is not enough: improving text database selection using shrinkage , 2004, SIGMOD '04.
[35] Svetlana Hensman,et al. Construction of Conceptual Graph Representation of Texts , 2004, NAACL.
[36] Tao Luo,et al. Discovery and Evaluation of Aggregate Usage Profiles for Web Personalization , 2004, Data Mining and Knowledge Discovery.
[37] Ricardo A. Baeza-Yates,et al. Dynamics of the Chilean Web Structure , 2004, WebDyn@WWW.
[38] Atanas Kiryakov,et al. KIM – a semantic platform for information extraction and retrieval , 2004, Natural Language Engineering.
[39] John Linn,et al. Technology and web user data privacy - a survey of risks and countermeasures , 2005, IEEE Security & Privacy.
[40] Xindong Wu,et al. Support vector machines based on K-means clustering for real-time business intelligence systems , 2005, Int. J. Bus. Intell. Data Min..
[41] Sandip Debnath,et al. Automatic identification of informative sections of Web pages , 2005, IEEE Transactions on Knowledge and Data Engineering.
[42] Michihiko Minoh,et al. Modeling hypermedia-based communication , 2005, Inf. Sci..
[43] Theo P. van der Weide,et al. A formal derivation of Heaps' Law , 2005, Inf. Sci..
[44] Chew Lim Tan,et al. A comprehensive comparative study on term weighting schemes for text categorization with support vector machines , 2005, WWW '05.
[45] Carlos Castillo,et al. Effective web crawling , 2005, SIGF.
[46] Christos Faloutsos,et al. Graph mining: Laws, generators, and algorithms , 2006, CSUR.
[47] Yong Wang,et al. Document Clustering with Semantic Analysis , 2006, Proceedings of the 39th Annual Hawaii International Conference on System Sciences (HICSS'06).
[48] Kin Keung Lai,et al. An integrated data preparation scheme for neural network data analysis , 2006, IEEE Transactions on Knowledge and Data Engineering.
[49] Spencer Rugaber,et al. Problems Modeling Web Sites and User Behavior , 2006, 2006 Eighth IEEE International Symposium on Web Site Evolution (WSE'06).
[50] Chien-Chung Chan,et al. Active User-Based and Ontology-Based Web Log Data Preprocessing for Web Usage Mining , 2006, 2006 IEEE/WIC/ACM International Conference on Web Intelligence (WI 2006 Main Conference Proceedings)(WI'06).
[51] J. Copas,et al. Interpreting Kullback-Leibler divergence with the Neyman-Pearson lemma , 2006 .
[52] Wilfred Ng,et al. Web dynamics and their ramifications for the development of Web search engines , 2006, Comput. Networks.
[53] Eelco Herder,et al. Off the beaten tracks: exploring three aspects of web navigation , 2006, WWW '06.
[54] Mitsuru Ishizuka,et al. Temporal multi-page summarization , 2006, Web Intell. Agent Syst..
[55] David Nadeau,et al. Semi-supervised named entity recognition: learning to recognize 100 entity types with little supervision , 2007 .
[56] A. Sima Etaner-Uyar,et al. Effects of Session Representation Models on the Performance of Web Recommender Systems , 2007, 2007 IEEE 23rd International Conference on Data Engineering Workshop.
[57] Thomas Wilhelm,et al. Metasploit Toolkit for Penetration Testing, Exploit Development, and Vulnerability Research , 2007 .
[58] Pablo Castells,et al. An Adaptation of the Vector-Space Model for Ontology-Based Information Retrieval , 2007, IEEE Transactions on Knowledge and Data Engineering.
[59] Mengjun Xie,et al. Automatic Cookie Usage Setting with CookiePicker , 2007, 37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN'07).
[60] Eelco Herder,et al. Web page revisitation revisited: implications of a long-term click-stream study of browser usage , 2007, CHI.
[61] Fang Wu,et al. The economics of attention: maximizing user value in information-rich environments , 2007, ADKDD '07.
[62] Ryen W. White,et al. WWW 2007 / Track: Browsers and User Interfaces Session: Personalization Investigating Behavioral Variability in Web Search , 2022 .
[63] Scott Dick,et al. A Survey and Analysis of the P3P Protocol's Agents, Adoption, Maintenance, and Future , 2007, IEEE Transactions on Dependable and Secure Computing.
[64] Filip Radlinski,et al. Evaluating the accuracy of implicit feedback from clicks and query reformulations in Web search , 2007, TOIS.
[65] Ricardo A. Baeza-Yates,et al. Characterization of national Web domains , 2007, TOIT.
[66] Deepayan Chakrabarti,et al. Page-level template detection via isotonic smoothing , 2007, WWW '07.
[67] Rohini K. Srihari,et al. Graph-based text representation and knowledge discovery , 2007, SAC '07.
[68] Bing Liu,et al. Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data , 2006, Data-Centric Systems and Applications.
[69] David Maynor,et al. Chapter 1 – Introduction to Metasploit , 2007 .
[70] Charles V. Wright,et al. On Web Browsing Privacy in Anonymized NetFlows , 2007, USENIX Security Symposium.
[71] John Yen,et al. Advances in Web Mining and Web Usage Analysis, 8th International Workshop on Knowledge Discovery on the Web, WebKDD 2006, Philadelphia, PA, USA, August 20, 2006, Revised Papers , 2007, WebKDD.
[72] Sankar K. Pal,et al. Stemming via Distribution-Based Word Segregation for Classification and Retrieval , 2007, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).
[73] Olfa Nasraoui,et al. Web data mining: exploring hyperlinks, contents, and usage data , 2008, SKDD.
[74] Wolfgang Nejdl,et al. Semantically Enhanced Entity Ranking , 2008, WISE.
[75] Lori Lorigo,et al. Eye Monitoring in Online Search , 2008 .
[76] Yan Li,et al. Research on Path Completion Technique in Web Usage Mining , 2008, 2008 International Symposium on Computer Science and Computational Technology.
[77] Filip Radlinski,et al. How does clickthrough data reflect retrieval quality? , 2008, CIKM '08.
[78] Jie Li,et al. Characterizing typical and atypical user sessions in clickstreams , 2008, WWW.
[79] Sandeep Pandey,et al. Recrawl scheduling based on information longevity , 2008, WWW.
[80] Pablo E. Román,et al. Web User Session Reconstruction Using Integer Programming , 2008, 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology.
[81] Wilson C. Hsieh,et al. Bigtable: A Distributed Storage System for Structured Data , 2006, TOCS.
[82] Ruimin Shen,et al. Why web 2.0 is good for learning and for research: principles and prototypes , 2008, WWW.
[83] V. Palade,et al. Adaptive Web Sites - A Knowledge Extraction from Web Data Approach , 2008, Frontiers in Artificial Intelligence and Applications.
[84] Andy Cockburn,et al. An empirical characterisation of electronic document navigation , 2008, Graphics Interface.
[85] Eelco Herder,et al. Not quite the average: An empirical study of Web use , 2008, TWEB.
[86] Gerhard Weikum,et al. Efficiently Handling Dynamics in Distributed Link Based Authority Analysis , 2008, WISE.
[87] Susan T. Dumais,et al. The web changes everything: understanding the dynamics of web content , 2009, WSDM '09.
[88] Václav Snásel,et al. Web Content Mining Focused on Named Objects , 2009, IHCI.
[89] Shady Shehata,et al. A WordNet-Based Semantic Model for Enhancing Text Clustering , 2009, 2009 IEEE International Conference on Data Mining Workshops.
[90] Petra Benkovská,et al. Web Usage Mining , 2009, Encyclopedia of Database Systems.
[91] Bruce Bukiet,et al. Internet Search Result Probabilities: Heaps' Law and Word Associativity* , 2009, J. Quant. Linguistics.
[92] M.C. Monard,et al. Improvement on the Porter's Stemming Algorithm for Portuguese , 2009, IEEE Latin America Transactions.
[93] Radek Burget,et al. Web Page Element Classification Based on Visual Features , 2009, 2009 First Asian Conference on Intelligent Information and Database Systems.
[94] Jason I. Hong,et al. Contextual web history: using visual and contextual cues to improve web browser history , 2009, CHI.
[95] Jian Su,et al. Supervised and Traditional Term Weighting Methods for Automatic Text Categorization , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[96] Peter Wittek,et al. Improving Text Classification by a Sense Spectrum Approach to Term Expansion , 2009, CoNLL.
[97] Juan D. Velásquez,et al. Design and Implementation of a Methodology for Identifying Website Keyobjects , 2009, KES.
[98] Maria Moloney,et al. A Privacy Control Theory for Online Environments , 2009, 2009 42nd Hawaii International Conference on System Sciences.
[99] Ibrahim Türkoglu,et al. Creating meaningful data from web logs for improving the impressiveness of a website by using path analysis method , 2009, Expert Syst. Appl..
[100] Murat Ali Bayir,et al. Smart Miner: a new framework for mining large scale web usage data , 2009, WWW '09.
[101] Lora Aroyo,et al. The Semantic Web: Research and Applications , 2009, Lecture Notes in Computer Science.
[102] Wolfgang Nejdl,et al. How to Trace and Revise Identities , 2009, ESWC.
[103] Felix Naumann,et al. Data fusion , 2009, CSUR.
[104] Pablo E. Román,et al. Web User Session Reconstruction with Back Button Browsing , 2009, KES.
[105] Ana Pont,et al. Dweb model: Representing Web 2.0 dynamism , 2009, Comput. Commun..
[106] Ninghui Li,et al. End-User Privacy in Human–Computer Interaction , 2009 .
[107] Gerhard Weikum,et al. Data quality in web archiving , 2009, WICOW.
[108] Marius Kloft,et al. Active and Semi-supervised Data Domain Description , 2009, ECML/PKDD.
[109] Jason Alexander,et al. Understanding and improving navigation within electronic documents , 2009 .
[110] Brian D. Davison,et al. Web page classification: Features and algorithms , 2009, CSUR.
[111] Wolfgang Maass,et al. Ontology-Based Natural Language Processing for In-store Shopping Situations , 2009, 2009 IEEE International Conference on Semantic Computing.
[112] Pablo E. Román,et al. A Dynamic Stochastic Model Applied to the Analysis of the Web User Behavior , 2010 .
[113] James A. Thom,et al. Entity Extraction from the Web with WebKnox , 2010 .
[114] Iraklis Varlamis,et al. An Experimental Study on Unsupervised Graph-based Word Sense Disambiguation , 2010, CICLing.
[115] Olfa Nasraoui,et al. Web Usage Mining , 2011 .