论文信息 - An Efficient High-Dimensional Big Data Storage Structure Based on US-ELM

An Efficient High-Dimensional Big Data Storage Structure Based on US-ELM

With the rapid development of computer and the Internet techniques, the amount of data in all walks of life increases sharply, especially accumulating numerous high-dimensional big data such as the network transactions data, the user reviews data and the multimedia data. The storing structure of high-dimensional big data is a critical factor that can affect the processing performance in a fundamental way. However, due to the huge dimensionality feature of high-dimensional data, the existing data storage techniques, such as row-store and column-store, are not very suitable for high-dimensional and large scale data. Therefore, in this paper, we present an efficient high-dimensional big data storage structure based on US-ELM, High-dimensional Big Data File, named HB-File, which is a hybrid storage model of row-store and column-store. With the intensive experiments, we show the effectiveness of HB-File for storing the high-dimensional big data.

[1] Ye Yuan,et al. Extreme learning machine for classification over uncertain data , 2014, Neurocomputing.

[2] Guang-Bin Huang,et al. Extreme learning machine: a new learning scheme of feedforward neural networks , 2004, 2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No.04CH37541).

[3] Zhiwei Xu,et al. RCFile: A fast and space-efficient data placement structure in MapReduce-based warehouse systems , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[4] David J. DeWitt,et al. Weaving Relations for Cache Performance , 2001, VLDB.

[5] James C. Bezdek,et al. Efficient Implementation of the Fuzzy c-Means Clustering Algorithms , 1986, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6] Frederick Reiss,et al. Constant-Time Query Processing , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[7] Ye Yuan,et al. An OS-ELM based distributed ensemble classification framework in P2P networks , 2011, Neurocomputing.

[8] Per-Åke Larson,et al. The Hekaton Memory-Optimized OLTP Engine , 2013, IEEE Data Eng. Bull..

[9] Michael I. Jordan,et al. On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[10] Sanjay Ghemawat,et al. MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[11] Guoliang Li,et al. MassJoin: A mapreduce-based method for scalable string similarity joins , 2014, 2014 IEEE 30th International Conference on Data Engineering.

[12] Guang-Bin Huang,et al. Learning to Rank with Extreme Learning Machine , 2013, Neural Processing Letters.

[13] Martin L. Kersten,et al. Database Architecture Optimized for the New Bottleneck: Memory Access , 1999, VLDB.

[14] Dorin Carstoiu,et al. Hbase - non SQL Database, Performances Evaluation , 2010, Int. J. Adv. Comp. Techn..

[15] Beng Chin Ooi,et al. Big data: the driver for innovation in databases , 2014 .

[16] Chee Kheong Siew,et al. Extreme learning machine: Theory and applications , 2006, Neurocomputing.

[17] Mikhail Belkin,et al. Laplacian Eigenmaps for Dimensionality Reduction and Data Representation , 2003, Neural Computation.

[18] Gianluigi Zanetti,et al. Pydoop: a Python MapReduce and HDFS API for Hadoop , 2010, HPDC '10.

[19] Ge Yu,et al. i2MapReduce: Incremental mapreduce for mining evolving big data , 2015, 2016 IEEE 32nd International Conference on Data Engineering (ICDE).

[20] Chengqi Zhang,et al. Scalable big graph processing in MapReduce , 2014, SIGMOD Conference.

[21] Howard Gobioff,et al. The Google file system , 2003, SOSP '03.

[22] Cheng Wu,et al. Semi-Supervised and Unsupervised Extreme Learning Machines , 2014, IEEE Transactions on Cybernetics.

[23] Yiqiang Chen,et al. SELM: Semi-supervised ELM with application in sparse calibrated location estimation , 2011, Neurocomputing.