An Efficient High-Dimensional Big Data Storage Structure Based on US-ELM

With the rapid development of computer and the Internet techniques, the amount of data in all walks of life increases sharply, especially accumulating numerous high-dimensional big data such as the network transactions data, the user reviews data and the multimedia data. The storing structure of high-dimensional big data is a critical factor that can affect the processing performance in a fundamental way. However, due to the huge dimensionality feature of high-dimensional data, the existing data storage techniques, such as row-store and column-store, are not very suitable for high-dimensional and large scale data. Therefore, in this paper, we present an efficient high-dimensional big data storage structure based on US-ELM, High-dimensional Big Data File, named HB-File, which is a hybrid storage model of row-store and column-store. With the intensive experiments, we show the effectiveness of HB-File for storing the high-dimensional big data.

[1]  Ye Yuan,et al.  Extreme learning machine for classification over uncertain data , 2014, Neurocomputing.

[2]  Guang-Bin Huang,et al.  Extreme learning machine: a new learning scheme of feedforward neural networks , 2004, 2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No.04CH37541).

[3]  Zhiwei Xu,et al.  RCFile: A fast and space-efficient data placement structure in MapReduce-based warehouse systems , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[4]  David J. DeWitt,et al.  Weaving Relations for Cache Performance , 2001, VLDB.

[5]  James C. Bezdek,et al.  Efficient Implementation of the Fuzzy c-Means Clustering Algorithms , 1986, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Frederick Reiss,et al.  Constant-Time Query Processing , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[7]  Ye Yuan,et al.  An OS-ELM based distributed ensemble classification framework in P2P networks , 2011, Neurocomputing.

[8]  Per-Åke Larson,et al.  The Hekaton Memory-Optimized OLTP Engine , 2013, IEEE Data Eng. Bull..

[9]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[10]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[11]  Guoliang Li,et al.  MassJoin: A mapreduce-based method for scalable string similarity joins , 2014, 2014 IEEE 30th International Conference on Data Engineering.

[12]  Guang-Bin Huang,et al.  Learning to Rank with Extreme Learning Machine , 2013, Neural Processing Letters.

[13]  Martin L. Kersten,et al.  Database Architecture Optimized for the New Bottleneck: Memory Access , 1999, VLDB.

[14]  Dorin Carstoiu,et al.  Hbase - non SQL Database, Performances Evaluation , 2010, Int. J. Adv. Comp. Techn..

[15]  Beng Chin Ooi,et al.  Big data: the driver for innovation in databases , 2014 .

[16]  Chee Kheong Siew,et al.  Extreme learning machine: Theory and applications , 2006, Neurocomputing.

[17]  Mikhail Belkin,et al.  Laplacian Eigenmaps for Dimensionality Reduction and Data Representation , 2003, Neural Computation.

[18]  Gianluigi Zanetti,et al.  Pydoop: a Python MapReduce and HDFS API for Hadoop , 2010, HPDC '10.

[19]  Ge Yu,et al.  i2MapReduce: Incremental mapreduce for mining evolving big data , 2015, 2016 IEEE 32nd International Conference on Data Engineering (ICDE).

[20]  Chengqi Zhang,et al.  Scalable big graph processing in MapReduce , 2014, SIGMOD Conference.

[21]  Howard Gobioff,et al.  The Google file system , 2003, SOSP '03.

[22]  Cheng Wu,et al.  Semi-Supervised and Unsupervised Extreme Learning Machines , 2014, IEEE Transactions on Cybernetics.

[23]  Yiqiang Chen,et al.  SELM: Semi-supervised ELM with application in sparse calibrated location estimation , 2011, Neurocomputing.