Teaching Statistics at Google-Scale

Modern data and applications pose very different challenges from those of the 1950s or even the 1980s. Students contemplating a career in statistics or data science need to have the tools to tackle problems involving massive, heavy-tailed data, often interacting with live, complex systems. However, despite the deepening connections between engineering and modern data science, we argue that training in classical statistical concepts plays a central role in preparing students to solve Google-scale problems. To this end, we present three industrial applications where significant modern data challenges were overcome by statistical thinking. [Received December 2014. Revised August 2015.]

[1]  Nicholas J. Horton,et al.  Data Science in Statistics Curricula: Preparing Students to “Think with Data” , 2014, 1410.3127.

[2]  Herbert K. H. Lee,et al.  Lossless Online Bayesian Bagging , 2004, J. Mach. Learn. Res..

[3]  E. L. Lehmann,et al.  Theory of point estimation , 1950 .

[4]  Martin Wattenberg,et al.  Ad click prediction: a view from the trenches , 2013, KDD.

[5]  D. Bates,et al.  Fitting Linear Mixed-Effects Models Using lme4 , 2014, 1406.5823.

[6]  Amir Najmi,et al.  Estimating Uncertainty for Massive Data Streams , 2012 .

[7]  Wei Pan,et al.  On Efficient Large Margin Semisupervised Learning: Method and Theory , 2009, J. Mach. Learn. Res..

[8]  Stefan Wager,et al.  Feedback Detection for Live Predictors , 2013, NIPS.

[9]  Joshua D. Angrist,et al.  Identification of Causal Effects Using Instrumental Variables , 1993 .

[10]  John Langford,et al.  Sparse Online Learning via Truncated Gradient , 2008, NIPS.

[11]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[12]  M. Kenward,et al.  An Introduction to the Bootstrap , 2007 .

[13]  J. Manyika Big data: The next frontier for innovation, competition, and productivity , 2011 .

[14]  Christian P. Robert,et al.  Large-scale inference , 2010 .

[15]  G. Casella An Introduction to Empirical Bayes Data Analysis , 1985 .

[16]  D. Rubin Causal Inference Using Potential Outcomes , 2005 .

[17]  James A. Hanley,et al.  Creating non-parametric bootstrap samples using Poisson frequencies , 2006, Comput. Methods Programs Biomed..

[18]  Yvan Vander Heyden,et al.  Estimating Uncertainty , 2022 .

[19]  Ambuj Tewari,et al.  Composite objective mirror descent , 2010, COLT 2010.