Learning extreme verification latency quickly with importance weighting: FAST COMPOSE & LEVEL_IW

Muhammad Umer LEARNING EXTREME VERIFICATION LATENCY QUICKLY WITH IMPORTANCE WEIGHTING: FAST COMPOSE & LEVELIW 2016-2017 Robi Polikar, Ph.D. Master of Science in Electrical & Computer Engineering One of the more challenging real-world problems in computational intelligence is to learn from non-stationary streaming data, also known as concept drift. Perhaps even a more challenging version of this scenario is when – following a small set of initial labeled data – the data stream consists of unlabeled data only. Such a scenario is typically referred to as learning in initially labeled nonstationary environment, or simply as extreme verification latency (EVL). This thesis introduces two different algorithms to operate in this domain. One of these algorithms is a simple modification of our prior work, COMPOSE (COMPacted Object Sample Extraction), that allows the algorithm to work without its extremely computationally expensive core support extraction module. We call this modified algorithm FAST COMPOSE. The other algorithm we propose that works in this setting is based on the importance weighting domain adaptation approach. We explore importance weighting to match distributions between two consecutive time steps, and estimate the posterior distribution of the unlabeled data using importance weighted least squares probabilistic classifier. The estimated labels are then iteratively used as the training data for the next time step. We call this algorithm LEVELIW, short for Learning Extreme VErification Latency with Importance Weighting. An additional important contribution of this thesis is a comprehensive survey and comparative analysis of competing algorithms to point out the weaknesses and strengths of different approaches from three different perspectives: classification accuracy, computational complexity and parameter sensitivity using several synthetic and real world datasets.

[1]  David B. Skillicorn,et al.  Classification Using Streaming Random Forests , 2011, IEEE Transactions on Knowledge and Data Engineering.

[2]  Indre Zliobaite,et al.  Identifying Hidden Contexts in Classification , 2011, PAKDD.

[3]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[4]  J. C. Schlimmer,et al.  Incremental learning from noisy data , 2004, Machine Learning.

[5]  Gerhard Widmer,et al.  Learning in the presence of concept drift and hidden contexts , 2004, Machine Learning.

[6]  Luís Torgo,et al.  Resampling strategies for imbalanced time series forecasting , 2016, International Journal of Data Science and Analytics.

[7]  S. Hoeglinger,et al.  Use of Hoeffding trees in concept based data stream mining , 2007, 2007 Third International Conference on Information and Automation for Sustainability.

[8]  Robi Polikar,et al.  COMPOSE: A Semisupervised Learning Framework for Initially Labeled Nonstationary Streaming Data , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[9]  ChengXiang Zhai,et al.  Instance Weighting for Domain Adaptation in NLP , 2007, ACL.

[10]  K. Jellinger Toward Brain-Computer Interfacing , 2009 .

[11]  R. French Catastrophic forgetting in connectionist networks , 1999, Trends in Cognitive Sciences.

[12]  Charles X. Ling,et al.  Fast Generalized Distillation for Semi-Supervised Domain Adaptation , 2017, AAAI.

[13]  Robi Polikar,et al.  Active learning in nonstationary environments , 2013, The 2013 International Joint Conference on Neural Networks (IJCNN).

[14]  Motoaki Kawanabe,et al.  Machine Learning in Non-Stationary Environments - Introduction to Covariate Shift Adaptation , 2012, Adaptive computation and machine learning.

[15]  Marcus A. Maloof,et al.  Dynamic Weighted Majority: An Ensemble Method for Drifting Concepts , 2007, J. Mach. Learn. Res..

[16]  Peter Stone,et al.  Transfer Learning for Reinforcement Learning Domains: A Survey , 2009, J. Mach. Learn. Res..

[17]  Nathalie Japkowicz,et al.  The class imbalance problem: A systematic study , 2002, Intell. Data Anal..

[18]  R. Polikar,et al.  Multiple Classifiers Based Incremental Learning Algorithm for Learning in Nonstationary Environments , 2007, 2007 International Conference on Machine Learning and Cybernetics.

[19]  Robi Polikar,et al.  Core support extraction for learning from initially labeled nonstationary environments using COMPOSE , 2014, 2014 International Joint Conference on Neural Networks (IJCNN).

[20]  Ayhan Demiriz,et al.  Semi-Supervised Support Vector Machines , 1998, NIPS.

[21]  Mykola Pechenizkiy,et al.  Dynamic integration of classifiers for handling concept drift , 2008, Inf. Fusion.

[22]  Robi Polikar,et al.  Learning under extreme verification latency quickly: FAST COMPOSE , 2016, 2016 IEEE Symposium Series on Computational Intelligence (SSCI).

[23]  Masashi Sugiyama,et al.  Importance-weighted least-squares probabilistic classifier for covariate shift adaptation with application to human activity recognition , 2012, Neurocomputing.

[24]  J. Heckman Sample Selection Bias as a Specification Error (with an Application to the Estimation of Labor Supply Functions) , 1977 .

[25]  Albert Bifet,et al.  Adaptive learning and mining for data streams and frequent patterns , 2009, SKDD.

[26]  João Gama,et al.  Classification of Evolving Data Streams with Infinitely Delayed Labels , 2015, 2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA).

[27]  Robi Polikar,et al.  Quantifying the limited and gradual concept drift assumption , 2015, 2015 International Joint Conference on Neural Networks (IJCNN).

[28]  Qiang Yang,et al.  Transferring Naive Bayes Classifiers for Text Classification , 2007, AAAI.

[29]  Klaus-Robert Müller,et al.  Covariate Shift Adaptation by Importance Weighted Cross Validation , 2007, J. Mach. Learn. Res..

[30]  Robi Polikar,et al.  Incremental learning in nonstationary environments with controlled forgetting , 2009, 2009 International Joint Conference on Neural Networks.

[31]  Neil D. Lawrence,et al.  Dataset Shift in Machine Learning , 2009 .

[32]  Robi Polikar,et al.  Learning concept drift in nonstationary environments using an ensemble of classifiers based approach , 2008, 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence).

[33]  Geoff Holmes,et al.  New ensemble methods for evolving data streams , 2009, KDD.

[34]  Zoubin Ghahramani,et al.  Learning from labeled and unlabeled data with label propagation , 2002 .

[35]  Chong-Wah Ngo,et al.  Semi-supervised Domain Adaptation with Subspace Learning for visual recognition , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Steffen Bickel,et al.  Discriminative Learning Under Covariate Shift , 2009, J. Mach. Learn. Res..

[37]  Takafumi Kanamori,et al.  A Least-squares Approach to Direct Importance Estimation , 2009, J. Mach. Learn. Res..

[38]  Jiawei Han,et al.  On Appropriate Assumptions to Mine Data Streams: Analysis and Practice , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[39]  Sadaoki Furui,et al.  Comparison of speaker recognition methods using statistical features and dynamic features , 1981 .

[40]  Cesare Alippi,et al.  Change detection tests using the ICI rule , 2010, The 2010 International Joint Conference on Neural Networks (IJCNN).

[41]  Georg Krempl,et al.  The Algorithm APT to Classify in Concurrence of Latency and Drift , 2011, IDA.

[42]  William Nick Street,et al.  A streaming ensemble algorithm (SEA) for large-scale classification , 2001, KDD '01.

[43]  Geoff Holmes,et al.  MOA: Massive Online Analysis , 2010, J. Mach. Learn. Res..

[44]  Robert A. Lordo,et al.  Nonparametric and Semiparametric Models , 2005, Technometrics.

[45]  Gregory Ditzler,et al.  Semi-supervised learning in nonstationary environments , 2011, The 2011 International Joint Conference on Neural Networks.

[46]  Nitesh V. Chawla,et al.  Heuristic Updatable Weighted Random Subspaces for Non-stationary Environments , 2011, 2011 IEEE 11th International Conference on Data Mining.

[47]  Bernhard Schölkopf,et al.  Correcting Sample Selection Bias by Unlabeled Data , 2006, NIPS.

[48]  Trevor Darrell,et al.  Continuous Manifold Based Adaptation for Evolving Visual Domains , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[49]  H. Shimodaira,et al.  Improving predictive inference under covariate shift by weighting the log-likelihood function , 2000 .

[50]  Gregory Ditzler,et al.  Incremental Learning of Concept Drift from Streaming Imbalanced Data , 2013, IEEE Transactions on Knowledge and Data Engineering.

[51]  PolikarRobi,et al.  Incremental Learning of Concept Drift in Nonstationary Environments , 2011 .

[52]  Geoff Hulten,et al.  Mining time-changing data streams , 2001, KDD '01.

[53]  Karl Dyer COMPOSE: Compacted object sample extraction a framework for semi-supervised learning in nonstationary environments , 2015 .

[54]  Rajesh P. N. Rao,et al.  Towards adaptive classification for BCI , 2006, Journal of neural engineering.

[55]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[56]  ChengXiang Zhai,et al.  Domain Adaptation in Natural Language Processing , 2008 .

[57]  Stephen Grossberg,et al.  Nonlinear neural networks: Principles, mechanisms, and architectures , 1988, Neural Networks.

[58]  James J. Jiang A Literature Survey on Domain Adaptation of Statistical Classifiers , 2007 .

[59]  Gregory Ditzler,et al.  Hellinger distance based drift detection for nonstationary environments , 2011, 2011 IEEE Symposium on Computational Intelligence in Dynamic and Uncertain Environments (CIDUE).

[60]  Bhavani M. Thuraisingham,et al.  Classification and Novel Class Detection in Concept-Drifting Data Streams under Time Constraints , 2011, IEEE Transactions on Knowledge and Data Engineering.

[61]  Abraham Kandel,et al.  Real-time data mining of non-stationary data streams from sensor networks , 2008, Inf. Fusion.

[62]  Abraham Kandel,et al.  Info-fuzzy algorithms for mining dynamic data streams , 2008, Appl. Soft Comput..