论文信息 - Scalable Distributed Virtual Data Structures

Scalable Distributed Virtual Data Structures

Big data stored in scalable, distributed data structures is now popular. We extend the idea to big, virtual data. Big, virtual data is not stored, but materialized a record at a time in the nodes used by a scalable, distributed, virtual data structure spanning thousands of nodes. The necessary cloud infrastructure is now available for general use. The records are used by some big computation that scans every records and retains (or aggregates) only a few based on criteria provided by the client. The client sets a limit to the time the scan takes at each node, for example 10 minutes. We dene here two scalable distributed virtual data structures called VH* and VR*. They use, respectively, hash and range partitioning. While scan speed can dier between nodes, these select the smallest number of nodes necessary to perform the scan in the allotted time R. We show the usefulness of our structures by applying them to the problem of recovering an encryption key and to the classic knapsack problem.

Sushil Jajodia | Witold Litwin | Thomas Schwarz

[1] Paolo Toth,et al. Knapsack Problems: Algorithms and Computer Implementations , 1990 .

[2] Sanjay Ghemawat,et al. MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[3] Witold Litwin,et al. LH*—a scalable, distributed data structure , 1996, TODS.

[4] Ioana Manolescu,et al. Web Data Management , 2011 .

[5] Sushil Jajodia,et al. Recoverable Encryption through a Noised Secret over a Large Cloud , 2013, Trans. Large Scale Data Knowl. Centered Syst..

[6] Witold Litwin,et al. LH* - Linear Hashing for Distributed Files , 1993, SIGMOD Conference.

[7] Randy H. Katz,et al. Improving MapReduce Performance in Heterogeneous Environments , 2008, OSDI.

[8] Wilson C. Hsieh,et al. Bigtable: A Distributed Storage System for Structured Data , 2006, TOCS.

[9] Thomas Schwarz,et al. Top k Knapsack Joins and Closure Preliminary Results of On-Going Investigation , 2009 .

[10] Din J. Wasem. Mining of Massive Datasets , 2014 .

[11] Witold Litwin,et al. RP*: A Family of Order Preserving Scalable Distributed Data Structures , 1994, VLDB.

[12] Sushil Jajodia,et al. Key Recovery Using Noised Secret Sharing with Discounts over Large Clouds , 2013, 2013 International Conference on Social Computing.