Signal Processing Techniques Restructure The Big Data Era

Big data science has been developed into a topic that attracts attention from industry, academia and governments. The main objective in Big Data science is to recognize and extract meaningful information from huge amounts of heterogeneous data and unstructured data (which constitute 95% of big data). Signal Processing (SP) techniques and related statistical learning (SL) tools such as Principal Component Analysis (PCA), R-PCA (Robust PCA), Compressive Sampling (CS), convex optimization (CO), stochastic approximation (SA), kernel based learning (KBL) tasks are used for robustness, compression and dimensionality reduction in Big Data arising challenges. This review paper introduces Big Data related SP techniques and presents applications of this emerging field.

[1]  Shifei Ding,et al.  Extreme learning machine and its applications , 2013, Neural Computing and Applications.

[2]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[3]  H. Vincent Poor,et al.  Attribute-Distributed Learning: Models, Limits, and Algorithms , 2011, IEEE Transactions on Signal Processing.

[4]  Morteza Mardani,et al.  Subspace Learning and Imputation for Streaming Big Data Matrices and Tensors , 2014, IEEE Transactions on Signal Processing.

[5]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Joann J. Ordille,et al.  Data integration: the teenage years , 2006, VLDB.

[7]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[8]  Gunnar Rätsch,et al.  An introduction to kernel-based learning algorithms , 2001, IEEE Trans. Neural Networks.

[9]  Sujatha R. Upadhyaya,et al.  Parallel approaches to machine learning - A comprehensive survey , 2013, J. Parallel Distributed Comput..

[10]  Jian Zhang Deep Transfer Learning via Restricted Boltzmann Machine for Document Classification , 2011, 2011 10th International Conference on Machine Learning and Applications and Workshops.

[11]  Guillermo Sapiro,et al.  Dimensionality Reduction via Subspace and Submanifold Learning [From the Guest Editors] , 2011, IEEE Signal Process. Mag..

[12]  Yi Ma,et al.  Robust principal component analysis? , 2009, JACM.

[13]  Dimitri P. Bertsekas,et al.  Approximate Dynamic Programming , 2017, Encyclopedia of Machine Learning and Data Mining.

[14]  Nesime Tatbul,et al.  Streaming data integration: Challenges and opportunities , 2010, 2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010).

[15]  Eric O. Postma,et al.  Dimensionality Reduction: A Comparative Review , 2008 .

[16]  Qihui Wu,et al.  A survey of machine learning for big data processing , 2016, EURASIP Journal on Advances in Signal Processing.

[17]  Zhixun Su,et al.  Linearized Alternating Direction Method with Adaptive Penalty for Low-Rank Representation , 2011, NIPS.

[18]  Sau Dan Lee,et al.  Decision Trees for Uncertain Data , 2011, IEEE Transactions on Knowledge and Data Engineering.

[19]  Gonzalo Mateos,et al.  Modeling and Optimization for Big Data Analytics: (Statistical) learning tools for our era of data deluge , 2014, IEEE Signal Processing Magazine.

[20]  Jean-Philippe Thiran,et al.  Accelerated Microstructure Imaging via Convex Optimization (AMICO) from diffusion MRI data , 2015, NeuroImage.

[21]  Derek C. Rose,et al.  Deep Machine Learning - A New Frontier in Artificial Intelligence Research [Research Frontier] , 2010, IEEE Computational Intelligence Magazine.

[22]  John Langford,et al.  Scaling up machine learning: parallel and distributed approaches , 2011, KDD '11 Tutorials.

[23]  J. Manyika Big data: The next frontier for innovation, competition, and productivity , 2011 .

[24]  Dong Yu,et al.  Deep Learning and Its Applications to Signal and Information Processing [Exploratory DSP] , 2011, IEEE Signal Processing Magazine.

[25]  Bertha Guijarro-Berdiñas,et al.  A survey of methods for distributed machine learning , 2012, Progress in Artificial Intelligence.

[26]  Nicola Jones,et al.  Computer science: The learning machines , 2014, Nature.

[27]  Guillermo Sapiro,et al.  Dimensionality Reduction via Subspace and Submanifold Learning , 2011 .

[28]  Jason Weston,et al.  Joint Learning of Words and Meaning Representations for Open-Text Semantic Parsing , 2012, AISTATS.

[29]  Warren B. Powell,et al.  Approximate Dynamic Programming: Solving the Curses of Dimensionality (Wiley Series in Probability and Statistics) , 2007 .

[30]  S.A. Kassam,et al.  Robust techniques for signal processing: A survey , 1985, Proceedings of the IEEE.

[31]  Feiping Nie,et al.  Robust Matrix Completion via Joint Schatten p-Norm and lp-Norm Minimization , 2012, 2012 IEEE 12th International Conference on Data Mining.

[32]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[33]  Rong Jin,et al.  Online Feature Selection and Its Applications , 2014, IEEE Transactions on Knowledge and Data Engineering.

[34]  Chengqi Zhang,et al.  Active Learning without Knowing Individual Instance Labels: A Pairwise Label Homogeneity Query Approach , 2014, IEEE Transactions on Knowledge and Data Engineering.

[35]  Volkan Cevher,et al.  Convex Optimization for Big Data: Scalable, randomized, and parallel algorithms for big data analytics , 2014, IEEE Signal Processing Magazine.

[36]  Lawrence B. Holder,et al.  Generalized Query-Based Active Learning to Identify Differentially Methylated Regions in DNA , 2013, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[37]  Shiliang Sun,et al.  Cross-domain representation-learning framework with combination of class-separate and domain-merge objectives , 2012, CDKD '12.

[38]  H. Vincent Poor,et al.  Outlying Sequence Detection in Large Data Sets: A data-driven approach , 2014, IEEE Signal Processing Magazine.

[39]  S. R,et al.  Data Mining with Big Data , 2017, 2017 11th International Conference on Intelligent Systems and Control (ISCO).

[40]  Georgios B. Giannakis,et al.  Online Censoring for Large-Scale Regressions with Application to Streaming Big Data , 2015, IEEE Transactions on Signal Processing.

[41]  Burr Settles,et al.  Active Learning Literature Survey , 2009 .

[42]  Xuan Kong,et al.  Adaptive Signal Processing Algorithms: Stability and Performance , 1994 .

[43]  Fei Huang,et al.  Exploring Representation-Learning Approaches to Domain Adaptation , 2010 .

[44]  Sergios Theodoridis,et al.  Online Kernel-Based Classification Using Adaptive Projection Algorithms , 2008, IEEE Transactions on Signal Processing.

[45]  Sergios Theodoridis,et al.  Adaptive Learning in a World of Projections , 2011, IEEE Signal Processing Magazine.

[46]  Kang-Hyun Jo,et al.  3D scene reconstruction enhancement method based on automatic context analysis and convex optimization , 2014, Neurocomputing.

[47]  Klaus-Robert Müller,et al.  Analyzing Local Structure in Kernel-Based Learning: Explanation, Complexity, and Reliability Assessment , 2013, IEEE Signal Processing Magazine.

[48]  Chee Kheong Siew,et al.  Extreme learning machine: Theory and applications , 2006, Neurocomputing.

[49]  Dong Yu,et al.  Deep Learning and Its Applications to Signal and Information Processing , 2011 .

[50]  Chu-Ren Huang,et al.  Multi-Domain Sentiment Classification with Classifier Combination , 2011, Journal of Computer Science and Technology.

[51]  Gonzalo Mateos,et al.  Stochastic Approximation vis-a-vis Online Learning for Big Data Analytics [Lecture Notes] , 2014, IEEE Signal Processing Magazine.

[52]  Xue-wen Chen,et al.  Big Data Deep Learning: Challenges and Perspectives , 2014, IEEE Access.

[53]  Fu Lin,et al.  Design of Optimal Sparse Feedback Gains via the Alternating Direction Method of Multipliers , 2011, IEEE Transactions on Automatic Control.

[54]  Yonggang Wen,et al.  Toward Scalable Systems for Big Data Analytics: A Technology Tutorial , 2014, IEEE Access.

[55]  Sinisa Todorovic,et al.  Local-Learning-Based Feature Selection for High-Dimensional Data Analysis , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.