Practical lessons of data mining at Yahoo!

The usage of data in many commercial applications has been growing at an unprecedented pace in the last decade. While successful data mining efforts lead to major business advances, there were also numerous, less publicized efforts that for one or another reason failed. In this paper, we discuss practical lessons based on years of our data mining experiences at Yahoo! and offer insights into how to drive the data mining effort to success in a business environment. We use two significant Yahoo's applications as illustrative examples: shopping categorization and behavioral targeting; and reflect on four success factors: methodology, data, infrastructure, and people.

[1]  E. Shekita,et al.  Jaql , 2011, Proceedings of the VLDB Endowment.

[2]  John K Kruschke,et al.  Bayesian data analysis. , 2010, Wiley interdisciplinary reviews. Cognitive science.

[3]  Garth A. Gibson,et al.  Improving Datacenter Energy Efficiency Using a Fast Array of Wimpy Nodes , 2010 .

[4]  John F. Canny,et al.  Large-scale behavioral targeting , 2009, KDD.

[5]  Michael Isard,et al.  DryadLINQ: A System for General-Purpose Distributed Data-Parallel Computing Using a High-Level Language , 2008, OSDI.

[6]  Ravi Kumar,et al.  Pig latin: a not-so-foreign language for data processing , 2008, SIGMOD Conference.

[7]  Sanjay Ghemawat,et al.  MapReduce: simplified data processing on large clusters , 2008, CACM.

[8]  Yehuda Koren,et al.  Lessons from the Netflix prize challenge , 2007, SKDD.

[9]  Ron Kohavi,et al.  Practical guide to controlled experiments on the web: listen to your customers not to the hippo , 2007, KDD '07.

[10]  Matthew Richardson,et al.  Predicting clicks: estimating the click-through rate for new ads , 2007, WWW '07.

[11]  Qi Su,et al.  Internet-scale collection of human-reviewed data , 2007, WWW '07.

[12]  Astra C. Townley What Have You Done For Me Lately , 2007 .

[13]  Kunle Olukotun,et al.  Map-Reduce for Machine Learning on Multicore , 2006, NIPS.

[14]  Jerry Nedelman,et al.  Book review: “Bayesian Data Analysis,” Second Edition by A. Gelman, J.B. Carlin, H.S. Stern, and D.B. Rubin Chapman & Hall/CRC, 2004 , 2005, Comput. Stat..

[15]  Byron Dom,et al.  Document preprocessing for naive Bayes classification and clustering with mixture of multinomials , 2004, KDD.

[16]  John F. Canny,et al.  GaP: a factor model for discrete data , 2004, SIGIR '04.

[17]  David A. Cohn,et al.  Improving generalization with active learning , 1994, Machine Learning.

[18]  GhemawatSanjay,et al.  The Google file system , 2003 .

[19]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[20]  Greg Linden,et al.  Amazon . com Recommendations Item-to-Item Collaborative Filtering , 2001 .

[21]  Shang-Hua Teng,et al.  Smoothed analysis of algorithms: why the simplex algorithm usually takes polynomial time , 2001, STOC '01.

[22]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[23]  Padhraic Smyth,et al.  From Data Mining to Knowledge Discovery: An Overview , 1996, Advances in Knowledge Discovery and Data Mining.

[24]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.