暂无分享,去创建一个
[1] Ashish Agarwal,et al. Overlapping experiment infrastructure: more, better, faster experimentation , 2010, KDD.
[2] Gleb Gusev,et al. Periodicity in User Engagement with a Search Engine and Its Application to Online Controlled Experiments , 2017, ACM Trans. Web.
[3] Ron Kohavi,et al. Trustworthy online controlled experiments: five puzzling outcomes explained , 2012, KDD.
[4] Craig MacDonald,et al. Optimised Scheduling of Online Experiments , 2015, SIGIR.
[5] Diane Tang,et al. Focusing on the Long-term: It's Good for Users and Business , 2015, KDD.
[6] Thorsten Joachims,et al. Evaluating Retrieval Performance Using Clickthrough Data , 2003, Text Mining.
[7] Francesco Bonchi,et al. From "Dango" to "Japanese Cakes": Query Reformulation Models and Patterns , 2009, 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology.
[8] Craig MacDonald,et al. Generalized Team Draft Interleaving , 2015, CIKM.
[9] Ron Kohavi,et al. Practical guide to controlled experiments on the web: listen to your customers not to the hippo , 2007, KDD '07.
[10] Gleb Gusev,et al. Boosted Decision Tree Regression Adjustment for Variance Reduction in Online Controlled Experiments , 2016, KDD.
[11] Edoardo M. Airoldi,et al. Detecting Network Effects: Randomizing Over Randomized Experiments , 2017, KDD.
[12] Huizhi Xie,et al. Improving the Sensitivity of Online Controlled Experiments: Case Studies at Netflix , 2016, KDD.
[13] Shuchi Chawla,et al. A/B Testing of Auctions , 2016, EC.
[14] Gleb Gusev,et al. Practical Aspects of Sensitivity in Online Experimentation with User Engagement Metrics , 2015, CIKM.
[15] Filip Radlinski,et al. Optimized interleaving for online retrieval evaluation , 2013, WSDM.
[16] Ryen W. White,et al. Modeling dwell time to predict click-level satisfaction , 2014, WSDM.
[17] Gleb Gusev,et al. Future User Engagement Prediction and Its Application to Improve the Sensitivity of Online Experiments , 2015, WWW.
[18] Dean Eckles,et al. Uncertainty in online experiments with dependent data: an evaluation of bootstrap methods , 2013, KDD.
[19] Ron Kohavi,et al. Online Experimentation at Microsoft , 2009 .
[20] Katja Hofmann,et al. A probabilistic method for inferring preferences from clicks , 2011, CIKM '11.
[21] Milad Shokouhi. Detecting seasonal queries by time-series analysis , 2011, SIGIR '11.
[22] Ron Kohavi,et al. Improving the sensitivity of online controlled experiments by utilizing pre-experiment data , 2013, WSDM.
[23] Gleb Gusev,et al. Using the Delay in a Treatment Effect to Improve Sensitivity and Preserve Directionality of Engagement Metrics in A/B Experiments , 2017, WWW.
[24] Xian Wu,et al. Measuring Metrics , 2016, CIKM.
[25] Michael Bailey,et al. People and Cookies: Imperfect Treatment Assignment in Online Experiments , 2016, WWW.
[26] Yang Song,et al. Evaluating and predicting user engagement change with degraded search relevance , 2013, WWW.
[27] Yu Guo,et al. Statistical inference in two-stage online controlled experiments with treatment selection and validation , 2014, WWW.
[28] Alex Deng,et al. Diluted Treatment Effect Estimation for Trigger Analysis in Online Controlled Experiments , 2015, WSDM.
[29] M. de Rijke,et al. Multileaved Comparisons for Fast Online Evaluation , 2014, CIKM.
[30] Gleb Gusev,et al. Extreme States Distribution Decomposition Method for Search Engine Online Evaluation , 2015, KDD.
[31] Ron Kohavi,et al. Online controlled experiments at large scale , 2013, KDD.
[32] Trevor Hastie,et al. Some methods for heterogeneous treatment effect estimation in high dimensions , 2017, Statistics in medicine.
[33] Ron Kohavi,et al. Seven pitfalls to avoid when running controlled experiments on the web , 2009, KDD.
[34] S. T. Buckland,et al. An Introduction to the Bootstrap. , 1994 .
[35] Craig MacDonald,et al. Sequential Testing for Early Stopping of Online Experiments , 2015, SIGIR.
[36] Ya Xu,et al. Evaluating Mobile Apps with A/B and Quasi A/B Tests , 2016, KDD.
[37] Alexey Drutsa. Sign-Aware Periodicity Metrics of User Engagement for Online Search Quality Evaluation , 2015, SIGIR.
[38] Rosie Jones,et al. Beyond the session timeout: automatic hierarchical segmentation of search topics in query logs , 2008, CIKM '08.
[39] Anmol Bhasin,et al. Network A/B Testing: From Sampling to Estimation , 2015, WWW.
[40] Alex Deng,et al. Data-Driven Metric Development for Online Controlled Experiments: Seven Lessons Learned , 2016, KDD.
[41] Thorsten Joachims,et al. Unbiased Evaluation of Retrieval Quality using Clickthrough Data , 2002 .
[42] Ron Kohavi,et al. Controlled experiments on the web: survey and practical guide , 2009, Data Mining and Knowledge Discovery.
[43] Alexey Drutsa,et al. Consistent Transformation of Ratio Metrics for Efficient Online Controlled Experiments , 2018, WSDM.
[44] Ron Kohavi,et al. Seven rules of thumb for web site experimenters , 2014, KDD.
[45] Filip Radlinski,et al. Practical online retrieval evaluation , 2011, SIGIR.
[46] Pavel Serdyukov,et al. Search Engine Evaluation based on Search Engine Switching Prediction , 2015, SIGIR.
[47] Pete Koomen,et al. Peeking at A/B Tests: Why it matters, and what to do about it , 2017, KDD.
[48] Mounia Lalmas,et al. Measuring User Engagement , 2014, Measuring User Engagement.
[49] Eugene Kharitonov,et al. Learning Sensitive Combinations of A/B Test Metrics , 2017, WSDM.
[50] Alex Deng,et al. Continuous Monitoring of A/B Tests without Pain: Optional Stopping in Bayesian Testing , 2016, 2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA).
[51] Robert Tibshirani,et al. An Introduction to the Bootstrap , 1994 .
[52] Shie Mannor,et al. A Nonparametric Sequential Test for Online Randomized Experiments , 2016, WWW.
[53] Milad Shokouhi,et al. On correlation of absence time and search effectiveness , 2014, SIGIR.
[54] Susan Athey,et al. Recursive partitioning for heterogeneous causal effects , 2015, Proceedings of the National Academy of Sciences.
[55] Gleb Gusev,et al. Engagement Periodicity in Search Engine Usage: Analysis and its Application to Search Quality Evaluation , 2015, WSDM.
[56] Filip Radlinski,et al. Large-scale validation and analysis of interleaved search evaluation , 2012, TOIS.
[57] Hilary Hutchinson,et al. Measuring the user experience on a large scale: user-centered metrics for web applications , 2010, CHI.
[58] Filip Radlinski,et al. How does clickthrough data reflect retrieval quality? , 2008, CIKM '08.
[59] Anmol Bhasin,et al. From Infrastructure to Culture: A/B Testing Challenges in Large Scale Social Networks , 2015, KDD.
[60] M. de Rijke,et al. Online Learning to Rank for Information Retrieval: SIGIR 2016 Tutorial , 2016, SIGIR.