Advances in Knowledge Discovery and Data Mining

Evaluating a trained system is an important component of machine learning. Labeling test data for large scale evaluation of a trained model can be extremely time consuming and expensive. In this paper we propose strategies for estimating performance of a classifier using as little labeling resource as possible. Specifically, we assume a labeling budget is given and the goal is to get a good estimate of the classifier performance using the provided labeling budget. We propose strategies to get a precise estimate of classifier accuracy under this restricted labeling budget scenario. We show that these strategies can reduce the variance in estimation of classifier accuracy by a significant amount compared to simple random sampling (over 65% in several cases). In terms of labeling resource, the reduction in number of samples required (compared to random sampling) to estimate the classifier accuracy with only 1% error is high as 60% in some cases.

[1]  Donald W. Bouldin,et al.  A Cluster Separation Measure , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Bart Baesens,et al.  Social network analysis for customer churn prediction , 2014, Appl. Soft Comput..

[3]  S. Shankar Sastry,et al.  Dissimilarity-Based Sparse Subset Selection , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Maoqiang Xie,et al.  Prioritizing Disease Genes by Bi-Random Walk , 2012, PAKDD.

[5]  Walter Willinger,et al.  On the Self-Similar Nature of Ethernet Traffic ( extended version ) , 1995 .

[6]  C. Faloutsos,et al.  Ensemble Methods , 2019, Machine Learning with Spark™ and Python®.

[7]  Yong Yu,et al.  Collaborative personalized tweet recommendation , 2012, SIGIR '12.

[8]  Jian Tang,et al.  Enhancing Effectiveness of Outlier Detections for Low Density Patterns , 2002, PAKDD.

[9]  Eric C. Larson,et al.  Disaggregated End-Use Energy Sensing for the Smart Grid , 2011, IEEE Pervasive Computing.

[10]  J. Zico Kolter,et al.  Contextually Supervised Source Separation with Application to Energy Disaggregation , 2013, AAAI.

[11]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[12]  Swapna S. Gokhale,et al.  Log-logistic software reliability growth model , 1998, Proceedings Third IEEE International High-Assurance Systems Engineering Symposium (Cat. No.98EX231).

[13]  Silvia Santini,et al.  The ECO data set and the performance of non-intrusive load monitoring algorithms , 2014, BuildSys@SenSys.

[14]  Jasper Snoek,et al.  Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[15]  Krishnakumar Balasubramanian,et al.  Unsupervised Supervised Learning I: Estimating Classification and Regression Errors without Labels , 2010, J. Mach. Learn. Res..

[16]  Jianyong Wang,et al.  Retweet or not?: personalized tweet re-ranking , 2013, WSDM.

[17]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[18]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[19]  Qing Yang,et al.  Time-Dependent Models in Collaborative Filtering Based Recommender System , 2009, 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology.

[20]  Yiming Yang,et al.  RCV1: A New Benchmark Collection for Text Categorization Research , 2004, J. Mach. Learn. Res..

[21]  Christos Faloutsos,et al.  Surprising Patterns for the Call Duration Distribution of Mobile Phone Users , 2010, ECML/PKDD.

[22]  Lars Schmidt-Thieme,et al.  BPR: Bayesian Personalized Ranking from Implicit Feedback , 2009, UAI.

[23]  Arthur Zimek,et al.  Ensembles for unsupervised outlier detection: challenges and research questions a position paper , 2014, SKDD.

[24]  Paul N. Bennett,et al.  Online stratified sampling: evaluating classifiers at web-scale , 2010, CIKM.

[25]  R. Singh,et al.  Approximately Optimum Stratification on the Auxiliary Variable , 1971 .

[26]  Panagiotis Symeonidis,et al.  Tag recommendations based on tensor dimensionality reduction , 2008, RecSys '08.

[27]  J. L. Hodges,et al.  Minimum Variance Stratification , 1959 .

[28]  Manish Marwah,et al.  Unsupervised Disaggregation of Low Frequency Power Measurements , 2011, SDM.

[29]  Ming Zhang,et al.  Understanding data center traffic characteristics , 2010, CCRV.

[30]  Shwetak N. Patel,et al.  ElectriSense: single-point sensing using EMI for electrical event detection and classification in the home , 2010, UbiComp.

[31]  Martin Saveski Web Services for Stream Mining : A Stream-Based Active Learning Use Case , 2011 .

[32]  G. W. Hart,et al.  Nonintrusive appliance load monitoring , 1992, Proc. IEEE.

[33]  Yoshua Bengio,et al.  Random Search for Hyper-Parameter Optimization , 2012, J. Mach. Learn. Res..

[34]  Lars Schmidt-Thieme,et al.  Learning optimal ranking with tensor factorization for tag recommendation , 2009, KDD.

[35]  Jon M. Kleinberg,et al.  Bursty and Hierarchical Structure in Streams , 2002, Data Mining and Knowledge Discovery.

[36]  Yoshihiro Yamanishi,et al.  Prediction of drug–target interaction networks from the integration of chemical and genomic spaces , 2008, ISMB.

[37]  Brian D. Davison,et al.  Learning to rank social update streams , 2012, SIGIR '12.

[38]  Tommi S. Jaakkola,et al.  Approximate Inference in Additive Factorial HMMs with Application to Energy Disaggregation , 2012, AISTATS.

[39]  Alex Rogers,et al.  Non-Intrusive Load Monitoring Using Prior Models of General Appliance Types , 2012, AAAI.

[40]  Anil K. Jain,et al.  Clustering ensembles: models of consensus and weak partitions , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[41]  Steffen Rendle,et al.  Factorization Machines with libFM , 2012, TIST.

[42]  F. Massey The Kolmogorov-Smirnov Test for Goodness of Fit , 1951 .

[43]  Andrew Y. Ng,et al.  Energy Disaggregation via Discriminative Sparse Coding , 2010, NIPS.

[44]  W. Bruce Croft,et al.  User oriented tweet ranking: a filtering approach to microblogs , 2011, CIKM '11.

[45]  Domonkos Tikk,et al.  Fast als-based matrix factorization for explicit and implicit feedback datasets , 2010, RecSys '10.

[46]  Ruslan Salakhutdinov,et al.  Bayesian probabilistic matrix factorization using Markov chain Monte Carlo , 2008, ICML '08.

[47]  Lars Schmidt-Thieme,et al.  Real-time top-n recommendation in social streams , 2012, RecSys.

[48]  Arthur Zimek,et al.  Data perturbation for outlier detection ensembles , 2014, SSDBM '14.

[49]  Hans-Peter Kriegel,et al.  Local outlier detection reconsidered: a generalized view on locality with applications to spatial, video, and network outlier detection , 2012, Data Mining and Knowledge Discovery.

[50]  Hans-Peter Kriegel,et al.  Generalized Outlier Detection with Flexible Kernel Density Estimates , 2014, SDM.

[51]  Steffen Bickel,et al.  Active Risk Estimation , 2010, ICML.

[52]  Yehuda Koren,et al.  Factorization meets the neighborhood: a multifaceted collaborative filtering model , 2008, KDD.

[53]  Jeffrey Scott Vitter,et al.  Random sampling with a reservoir , 1985, TOMS.

[54]  Yehuda Koren,et al.  Collaborative filtering with temporal dynamics , 2009, KDD.

[55]  Wei Tang,et al.  Ensembling neural networks: Many could be better than all , 2002, Artif. Intell..

[56]  Raja K. Herlekar The problem of optimum stratification , 1967 .

[57]  Douglas W. Oard,et al.  Implicit Feedback for Recommender Systems , 1998 .

[58]  Ke Zhang,et al.  A New Local Distance-Based Outlier Detection Approach for Scattered Real-World Data , 2009, PAKDD.

[59]  Brian D. Davison,et al.  Co-factorization machines: modeling user interests and predicting individual decisions in Twitter , 2013, WSDM.

[60]  S. Sathiya Keerthi,et al.  A Modified Finite Newton Method for Fast Solution of Large Scale Linear SVMs , 2005, J. Mach. Learn. Res..

[61]  Randy H. Katz,et al.  Heterogeneity and dynamicity of clouds at scale: Google trace analysis , 2012, SoCC '12.

[62]  Padhraic Smyth,et al.  Adaptive event detection with time-varying poisson processes , 2006, KDD '06.

[63]  Christos Faloutsos,et al.  Data mining meets performance evaluation: fast algorithms for modeling bursty traffic , 2002, Proceedings 18th International Conference on Data Engineering.

[64]  Yuval Kluger,et al.  Estimating the accuracies of multiple classifiers without labeled data , 2014, AISTATS.

[65]  Andrew McCallum,et al.  Toward interactive training and evaluation , 2011, CIKM '11.

[66]  S. Bennett,et al.  Log‐Logistic Regression Models for Survival Data , 1983 .

[67]  S. Shankar Sastry,et al.  Energy Disaggregation via Learning Powerlets and Sparse Coding , 2015, AAAI.

[68]  Wolfgang Fischer,et al.  The Markov-Modulated Poisson Process (MMPP) Cookbook , 1993, Perform. Evaluation.

[69]  Grigorios Tsoumakas,et al.  An Ensemble Pruning Primer , 2009, Applications of Supervised and Unsupervised Ensemble Methods.

[70]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[71]  Giorgio Valentini,et al.  Ensembles of Learning Machines , 2002, WIRN.

[72]  Bernardete Ribeiro,et al.  Electrical Signal Source Separation Via Nonnegative Tensor Factorization Using On Site Measurements in a Smart Home , 2014, IEEE Transactions on Instrumentation and Measurement.

[73]  Burr Settles,et al.  Active Learning , 2012, Synthesis Lectures on Artificial Intelligence and Machine Learning.

[74]  V. K. Sethi,et al.  A NOTE ON OPTIMUM STRATIFICATION OF POPULATIONS FOR ESTIMATING THE POPULATION MEANS , 1963 .