Learning and modelling big data

Caused by powerful sensors, advanced digitalisation tech- niques, and dramatically increased storage capabilities, big data in the sense of large or streaming data sets, very high dimensionality, or com- plex data formats constitute one of the major challenges faced by machine learning today. In this realm, a couple of typical assumptions of machine learning can no longer be met, such as e.g. the possibility to deal with all data in batch mode or data being identically distributed; this causes the need for novel algorithmic developments and paradigm shifts, or for the adaptation of existing ones to cope with such situations. The goal of this tutorial is to give an overview about recent machine learning approaches for big data, with a focus on principled algorithmic ideas in the field.

[1]  Ata Kabán,et al.  Non-parametric detection of meaningless distances in high dimensional data , 2011, Statistics and Computing.

[2]  Yu-Chiang Frank Wang,et al.  A rank-one update method for least squares linear discriminant analysis with concept drift , 2013, Pattern Recognit..

[3]  Christos Faloutsos,et al.  HEigen: Spectral Analysis for Billion-Scale Graphs , 2014, IEEE Transactions on Knowledge and Data Engineering.

[4]  Haibo He,et al.  SOMKE: Kernel Density Estimation Over Data Streams by Sequences of Self-Organizing Maps , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[5]  Yiannis Kompatsiaris,et al.  Enhancing Computer Vision Using the Collective Intelligence of Social Media , 2011, New Directions in Web Data Management 1.

[6]  Shengrui Wang,et al.  Information-Theoretic Outlier Detection for Large-Scale Categorical Data , 2013, IEEE Transactions on Knowledge and Data Engineering.

[7]  Robert P. W. Duin,et al.  The Dissimilarity Representation for Pattern Recognition - Foundations and Applications , 2005, Series in Machine Perception and Artificial Intelligence.

[8]  Guido Sanguinetti,et al.  Approximate inference in latent Gaussian-Markov models from continuous time observations , 2013, NIPS.

[9]  Meng Wang,et al.  Scene-Specific Pedestrian Detection for Static Video Surveillance , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Andrew B. Kahng Predicting the future of information technology and society [The Road Ahead] , 2012, IEEE Des. Test Comput..

[11]  Frank-Michael Schleif,et al.  Linear Time Relational Prototype Based Learning , 2012, Int. J. Neural Syst..

[12]  Svetlana Lazebnik,et al.  Iterative quantization: A procrustean approach to learning binary codes , 2011, CVPR 2011.

[13]  Kasturi R. Varadarajan,et al.  Geometric Approximation via Coresets , 2007 .

[14]  Haibo He,et al.  Towards incremental learning of nonstationary imbalanced data stream: a multiple selectively recursive approach , 2011, Evol. Syst..

[15]  Raul Cruz-Cano,et al.  Fast regularized canonical correlation analysis , 2014, Comput. Stat. Data Anal..

[16]  Tao Liu,et al.  Fast pruning superfluous support vectors in SVMs , 2013, Pattern Recognit. Lett..

[17]  Michael W. Mahoney,et al.  Robust Regression on MapReduce , 2013, ICML.

[18]  Shengxiang Yang,et al.  Dynamics in the Multi-objective Subset Sum: Analysing the Behavior of Population Based Algorithms , 2013 .

[19]  Germain Forestier,et al.  Comparison of optical sensors discrimination ability using spectral libraries , 2013 .

[20]  Graham Cormode,et al.  An improved data stream summary: the count-min sketch and its applications , 2004, J. Algorithms.

[21]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[22]  De-Shuang Huang,et al.  A Rayleigh-Ritz style method for large-scale discriminant analysis , 2014, Pattern Recognit..

[23]  Marc'Aurelio Ranzato,et al.  Large Scale Distributed Deep Networks , 2012, NIPS.

[24]  Shie Mannor,et al.  Online PCA for Contaminated Data , 2013, NIPS.

[25]  Shuicheng Yan,et al.  Online Robust PCA via Stochastic Optimization , 2013, NIPS.

[26]  Shai Shalev-Shwartz,et al.  Accelerated Mini-Batch Stochastic Dual Coordinate Ascent , 2013, NIPS.

[27]  Yaoliang Yu,et al.  On Decomposing the Proximal Map , 2013, NIPS.

[28]  Yasuhiko Jimbo,et al.  Trends in Neural Engineering , 2013 .

[29]  Johan A. K. Suykens,et al.  Hierarchical kernel spectral clustering , 2012, Neural Networks.

[30]  Yongli Ren,et al.  Identifying Microphone from Noisy Recordings by Using Representative Instance One Class-Classification Approach , 2012, J. Networks.

[31]  Isaac Z. Pesenson,et al.  The Data Big Bang and the Expanding Digital Universe: High-Dimensional, Complex and Massive Data Sets in an Inflationary Epoch , 2010 .

[32]  Leszek Koszalka,et al.  Task Allocation in Distributed Mesh-Connected Machine Learning System: Simplified Busy List Algorithm with Q-Learning Based Queuing , 2013, CORES.

[33]  Philip S. Yu,et al.  Top 10 algorithms in data mining , 2007, Knowledge and Information Systems.

[34]  Tianshun Chen,et al.  Optimizing the Gaussian kernel function with the formulated kernel target alignment criterion for two-class pattern classification , 2013, Pattern Recognit..

[35]  Andrew Zisserman,et al.  Deep Fisher Networks for Large-Scale Image Classification , 2013, NIPS.

[36]  Michael W. Mahoney Randomized Algorithms for Matrices and Data , 2011, Found. Trends Mach. Learn..

[37]  Barbara Hammer,et al.  Data visualization by nonlinear dimensionality reduction , 2015, WIREs Data Mining Knowl. Discov..

[38]  Erkki Oja,et al.  Clustering by Nonnegative Matrix Factorization Using Graph Random Walk , 2012, NIPS.

[39]  Pradeep Anand,et al.  Big Data Is a Big Deal , 2013 .

[40]  Ahmed Hamza Osman,et al.  A Novel Feature Selection Based on One-Way ANOVA F-Test for E-Mail Spam Classification , 2014 .

[41]  Atul Negi,et al.  Computational and space complexity analysis of SubXPCA , 2013, Pattern Recognit..

[42]  Matthew N. Dailey,et al.  Incremental behavior modeling and suspicious activity detection , 2013, Pattern Recognit..

[43]  Gert R. G. Lanckriet,et al.  Game-powered machine learning , 2012, Proceedings of the National Academy of Sciences.

[44]  Dean P. Foster,et al.  New Subsampling Algorithms for Fast Least Squares Regression , 2013, NIPS.

[45]  Jean-Daniel Fekete,et al.  Hierarchical Aggregation for Information Visualization: Overview, Techniques, and Design Guidelines , 2010, IEEE Transactions on Visualization and Computer Graphics.

[46]  Rong Jin,et al.  Nyström Method vs Random Fourier Features: A Theoretical and Empirical Comparison , 2012, NIPS.

[47]  Thomas Martinetz,et al.  Sparse Coding and Selected Applications , 2012, KI - Künstliche Intelligenz.

[48]  Nello Cristianini,et al.  An intelligent Web agent that autonomously learns how to translate , 2012, Web Intell. Agent Syst..

[49]  Eric P. Xing,et al.  A Scalable Approach to Probabilistic Latent Space Inference of Large-Scale Networks , 2013, NIPS.

[50]  Eric O. Postma,et al.  Dimensionality Reduction: A Comparative Review , 2008 .

[51]  Haibo He Self-Adaptive Systems for Machine Intelligence , 2011 .

[52]  Subhransu Maji,et al.  Efficient Classification for Additive Kernel SVMs , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[53]  Xiaofeng Du,et al.  Research of Internet Traffic Identification Scheme based on Machine Learning Algorithms , 2012 .

[54]  Thomas Martinetz,et al.  The Support Feature Machine: Classification with the Least Number of Features and Application to Neuroimaging Data , 2013, Neural Computation.

[55]  Nathan Linial,et al.  More data speeds up training time in learning halfspaces over sparse vectors , 2013, NIPS.

[56]  Vasant Dhar,et al.  Data science and prediction , 2012, CACM.

[57]  Haibo He,et al.  Incremental Learning From Stream Data , 2011, IEEE Transactions on Neural Networks.

[58]  Kazunori Matsumoto,et al.  Training Multiple Support Vector Machines for Personalized Web Content Filters , 2013, IEICE Trans. Inf. Syst..

[59]  Xindong Wu,et al.  Data mining with big data , 2014, IEEE Transactions on Knowledge and Data Engineering.

[60]  Douglas D. Heckathorn,et al.  Respondent-driven sampling II: deriving valid population estimates from chain-referral samples of hi , 2002 .

[61]  Slobodan Vucetic,et al.  Decentralized Estimation using distortion sensitive learning vector quantization , 2013, Pattern Recognit. Lett..

[62]  Dean P. Foster,et al.  One-shot learning and big data with n=2 , 2013, NIPS.

[63]  Anthony Rowe,et al.  Towards automated appliance recognition using an EMF sensor in NILM platforms , 2013, Adv. Eng. Informatics.

[64]  Gábor Lugosi,et al.  Prediction, learning, and games , 2006 .

[65]  Joseph M. Hellerstein,et al.  Distributed GraphLab: A Framework for Machine Learning in the Cloud , 2012, Proc. VLDB Endow..

[66]  Haibo He,et al.  Kernel-Based Approximate Dynamic Programming for Real-Time Online Learning Control: An Experimental Study , 2014, IEEE Transactions on Control Systems Technology.

[67]  Frank-Michael Schleif,et al.  Approximation techniques for clustering dissimilarity data , 2012, Neurocomputing.

[68]  Ahmad Mozaffari,et al.  Analyzing, controlling, and optimizing Damavand power plant operating parameters using a synchronous parallel shuffling self-organized Pareto strategy and neural network: a survey , 2012 .

[69]  Johan A. K. Suykens,et al.  Kernel Spectral Clustering for Big Data Networks , 2013, Entropy.

[70]  Yogesh L. Simmhan,et al.  Cloud-Based Software Platform for Big Data Analytics in Smart Grids , 2013, Computing in Science & Engineering.

[71]  Paul Lukowicz,et al.  A planetary nervous system for social mining and collective awareness , 2012, ArXiv.

[72]  Pedro M. Domingos A few useful things to know about machine learning , 2012, Commun. ACM.

[73]  Philip S. Yu,et al.  Structural Diversity for Resisting Community Identification in Published Social Networks , 2014, IEEE Transactions on Knowledge and Data Engineering.

[74]  Dean P. Foster,et al.  Faster Ridge Regression via the Subsampled Randomized Hadamard Transform , 2013, NIPS.

[75]  Jia Lei,et al.  Deep Learning: Yesterday, Today, and Tomorrow , 2013 .

[76]  Michael W. Mahoney,et al.  rCUR: an R package for CUR matrix decomposition , 2012, BMC Bioinformatics.

[77]  Alexandr Andoni,et al.  Near-Optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[78]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[79]  Kristian Kersting,et al.  Exploiting symmetries for scaling loopy belief propagation and relational training , 2013, Machine Learning.

[80]  Ana L. C. Bazzan,et al.  Introduction to Intelligent Systems in Traffic and Transportation , 2013, Introduction to Intelligent Systems in Traffic and Transportation.

[81]  Claudio Sartori,et al.  Distributed Strategies for Mining Outliers in Large Data Sets , 2013, IEEE Transactions on Knowledge and Data Engineering.

[82]  Tetsuo Tanaka Big Data Application Technology: An Overview , 2013 .

[83]  Jianhua Zhao,et al.  Automated learning of factor analysis with complete and incomplete data , 2014, Comput. Stat. Data Anal..

[84]  Yee Whye Teh,et al.  MCMC for continuous-time discrete-state systems , 2012, NIPS.

[85]  Gunnar E. Carlsson,et al.  Topology and data , 2009 .

[86]  Bernt Schiele,et al.  Transfer Learning in a Transductive Setting , 2013, NIPS.

[87]  Haibo He,et al.  A three-network architecture for on-line learning and optimization based on adaptive dynamic programming , 2012, Neurocomputing.

[88]  Geoff Holmes,et al.  MOA: Massive Online Analysis , 2010, J. Mach. Learn. Res..

[89]  M. Brescia,et al.  PHOTOMETRIC REDSHIFTS FOR QUASARS IN MULTI-BAND SURVEYS , 2013, 1305.5641.

[90]  Xuehua Wang,et al.  Feature selection for high-dimensional imbalanced data , 2013, Neurocomputing.

[91]  Ivor W. Tsang,et al.  Core Vector Machines: Fast SVM Training on Very Large Data Sets , 2005, J. Mach. Learn. Res..

[92]  T. Graepel,et al.  Private traits and attributes are predictable from digital records of human behavior , 2013, Proceedings of the National Academy of Sciences.

[93]  Bongwon Suh,et al.  Computational Framework for Generating Visual Summaries of Topical Clusters in Twitter Streams , 2014, Social Networks: A Framework of Computational Intelligence.

[94]  Shawn Bowers,et al.  The New Bioinformatics: Integrating Ecological Data from the Gene to the Biosphere , 2006 .

[95]  Anne E. Thessen,et al.  Applications of Natural Language Processing in Biodiversity Science , 2012, Adv. Bioinformatics.

[96]  Michael W. Mahoney,et al.  Revisiting the Nystrom Method for Improved Large-scale Machine Learning , 2013, J. Mach. Learn. Res..

[97]  Andre Wibisono,et al.  Streaming Variational Bayes , 2013, NIPS.

[98]  Martin C. Rinard,et al.  Verifying quantitative reliability for programs that execute on unreliable hardware , 2013, OOPSLA.

[99]  W. Art Chaovalitwongse,et al.  An Introduction to the Analysis of Functional Magnetic Resonance Imaging Data , 2013 .

[100]  Cheng Wu,et al.  A second order cone programming approach for semi-supervised learning , 2013, Pattern Recognit..

[101]  Ameet Talwalkar,et al.  Sampling Methods for the Nyström Method , 2012, J. Mach. Learn. Res..

[102]  Petr Skoda Astroinformatics: Getting New Knowledge from the Astronomical Data Avalanche , 2013, NOSTRADAMUS.

[103]  Ata Kabán,et al.  Random Projections as Regularizers: Learning a Linear Discriminant Ensemble from Fewer Observations than Dimensions , 2013, ACML.

[104]  Richard Lippmann,et al.  Machine learning in adversarial environments , 2010, Machine Learning.

[105]  Fuzhen Zhuang,et al.  Parallel extreme learning machine for regression based on MapReduce , 2013, Neurocomputing.

[106]  Martin J. Wainwright,et al.  Dual Averaging for Distributed Optimization: Convergence Analysis and Network Scaling , 2010, IEEE Transactions on Automatic Control.

[107]  Ning Ye,et al.  Boundary detection and sample reduction for one-class Support Vector Machines , 2014, Neurocomputing.

[108]  David M. Blei,et al.  Efficient Online Inference for Bayesian Nonparametric Relational Models , 2013, NIPS.

[109]  Yunming Ye,et al.  Stratified sampling for feature subspace selection in random forests for high dimensional data , 2013, Pattern Recognit..

[110]  Lorenzo Rosasco,et al.  GURLS: a least squares library for supervised learning , 2013, J. Mach. Learn. Res..

[111]  Pradipta Maji,et al.  A Rough Hypercuboid Approach for Feature Selection in Approximation Spaces , 2014, IEEE Transactions on Knowledge and Data Engineering.

[112]  Jichang Guo,et al.  Low-complexity distributed multi-view video coding for wireless video sensor networks based on compressive sensing theory , 2013, Neurocomputing.

[113]  Dahua Lin,et al.  Online Learning of Nonparametric Mixture Models via Sequential Variational Approximation , 2013, NIPS.

[114]  Thomas Martinetz,et al.  Soft-competitive learning of sparse codes and its application to image reconstruction , 2011, Neurocomputing.

[115]  Haibo He,et al.  Learning from Imbalanced Data , 2009, IEEE Transactions on Knowledge and Data Engineering.

[116]  Udo Seiffert,et al.  ANNIE - Artificial Neural Network-based Image Encoder , 2014, Neurocomputing.

[117]  John Yannis Goulermas,et al.  Prototype reduction based on Direct Weighted Pruning , 2014, Pattern Recognit. Lett..

[118]  Christopher J. Gatti,et al.  Hierarchical Clustering for Large Data Sets , 2013 .

[119]  Tetsuo Tomiyama,et al.  Advanced Engineering Informatics , 2007, Adv. Eng. Informatics.

[120]  N. Turk-Browne Functional Interactions as Big Data in the Human Brain , 2013, Science.

[121]  Sam Kwong,et al.  Inconsistency-based active learning for support vector machines , 2012, Pattern Recognit..

[122]  Art B. Owen,et al.  Data Squashing by Empirical Likelihood , 2004, Data Mining and Knowledge Discovery.

[123]  Yang Liu,et al.  Tri-training and MapReduce-based massive data learning , 2011, Int. J. Gen. Syst..

[124]  Martin Hilbert,et al.  The World’s Technological Capacity to Store, Communicate, and Compute Information , 2011, Science.

[125]  Karsten M. Borgwardt,et al.  Rapid Distance-Based Outlier Detection via Sampling , 2013, NIPS.

[126]  Nicolò Cesa-Bianchi,et al.  Online Learning with Switching Costs and Other Adaptive Adversaries , 2013, NIPS.

[127]  Erkki Oja,et al.  GPU-accelerated and parallelized ELM ensembles for large-scale regression , 2011, Neurocomputing.

[128]  Qiang Liu,et al.  Scoring Workers in Crowdsourcing: How Many Control Questions are Enough? , 2013, NIPS.

[129]  G. Brumfiel High-energy physics: Down the petabyte highway , 2011, Nature.

[130]  Weizhong Yan,et al.  p-PIC: Parallel power iteration clustering for big data , 2013, J. Parallel Distributed Comput..

[131]  Erkki Oja,et al.  Online Projective Nonnegative Matrix Factorization for Large Datasets , 2012, ICONIP.

[132]  Claire D'Este,et al.  Development of an intelligent environmental knowledge system for sustainable agricultural decision support , 2014, Environ. Model. Softw..

[133]  Lam-for Kwok,et al.  Enhancing False Alarm Reduction Using Voted Ensemble Selection in Intrusion Detection , 2013, Int. J. Comput. Intell. Syst..

[134]  W. B. Johnson,et al.  Extensions of Lipschitz mappings into Hilbert space , 1984 .

[135]  Kuan-Ching Li,et al.  Pipelined Multi-GPU MapReduce for Big-Data Processing , 2013 .

[136]  Andrew Beng Jin Teoh,et al.  An online learning network for biometric scores fusion , 2013, Neurocomputing.

[137]  Maria-Florina Balcan,et al.  Distributed k-means and k-median clustering on general communication topologies , 2013, NIPS.