How good are machine learning clouds for binary classification with good features?: extended abstract

In spite of the recent advancement of machine learning research, modern machine learning systems are still far from easy to use, at least from the perspective of business users or even scientists without a computer science background. Recently, there is a trend toward pushing machine learning onto the cloud as a "service," a.k.a. machine learning clouds. By putting a set of machine learning primitives on the cloud, these services significantly raise the level of abstraction for machine learning. For example, with Amazon Machine Learning, users only need to upload the dataset and specify the type of task (classification or regression). The cloud will then train machine learning models without any user intervention.

[1]  Rich Caruana,et al.  An empirical comparison of supervised learning algorithms , 2006, ICML.

[2]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[3]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[4]  Senén Barro,et al.  Do we need hundreds of classifiers to solve real world classification problems? , 2014, J. Mach. Learn. Res..

[5]  Marco Lui Feature Stacking for Sentence Classification in Evidence-Based Medicine , 2012, ALTA.

[6]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[7]  Adam Silberstein,et al.  Benchmarking cloud serving systems with YCSB , 2010, SoCC '10.

[8]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[9]  Simon Haykin,et al.  Neural Networks: A Comprehensive Foundation , 1998 .

[10]  J. S. Marron,et al.  Distance-Weighted Discrimination , 2007 .

[11]  Pedro M. Domingos A few useful things to know about machine learning , 2012, Commun. ACM.

[12]  Eric Bauer,et al.  An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants , 1999, Machine Learning.

[13]  Sebastian Nowozin,et al.  Decision Jungles: Compact and Rich Models for Classification , 2013, NIPS.

[14]  Gang Lu,et al.  CloudRank-D: benchmarking and ranking cloud computing systems for data processing applications , 2012, Frontiers of Computer Science.

[15]  Colin Campbell,et al.  Bayes Point Machines , 2001, J. Mach. Learn. Res..

[16]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.