Analytical Design of the DIS Architecture: The Hybrid Model

In the last few decades, and due to emergence of internet appliance, there is a strategical increase in the usage of data which had a high impact on the storage and mining technologies. It is also observed that the scientific/research field produces the zig-zag structure of data, viz., structured, semi-structured, and unstructured data. Comparably, processing of such data is relatively increased due to rugged requirements. There are sustainable technologies to address the challenges and to expedite scalable services via effective physical infrastructure (in terms of mining), smart networking solutions, and useful software approaches. Indeed, the cloud computing aims at data-intensive computing, by facilitating scalable processing of huge data. But still, the problem remains unaddressed with reference to huge data, and conversely, the data is growing exponentially faster. At this juncture, the recommendable algorithm is a well-known model, that is, MapReduce, to compress the huge and voluminous data. Conceptualization of any problem with the current model is less fault-tolerant and reliable, which may be surmounted by Hadoop architecture. On the contrary case, Hadoop is fault-tolerant and has the high throughput which is recommendable for applications having huge volume of data sets, thus file system requiring the streaming access. The paper examines and unravels what efficient architectural/design changes are necessary to bring the benefits of the Everest model, HBase algorithm, and the existing MR algorithms.

[1]  Tak-Lon Wu,et al.  Cloud computing paradigms for pleasingly parallel biomedical applications , 2011, Concurr. Comput. Pract. Exp..

[2]  C. K. Jha,et al.  MapReduce: Simplified Data Analysis of Big Data , 2015 .

[3]  N Suresh Goud,et al.  Data Intensive Computing in the Clouds , 2013 .

[4]  Dhabaleswar K. Panda,et al.  High-Performance Design of HBase with RDMA over InfiniBand , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium.

[5]  Oleg Sukhoroslov,et al.  Development of Data-Intensive Services with Everest , 2017, DAMDID/RCDL.

[6]  Oleg Sukhoroslov,et al.  A Web-Based Platform for Publication and Distributed Execution of Computing Applications , 2015, 2015 14th International Symposium on Parallel and Distributed Computing.

[7]  Gianluigi Zanetti,et al.  Channeling the data deluge , 2011, Nature Methods.

[8]  Judy Qiu,et al.  Cloud Technologies for Bioinformatics Applications , 2011, IEEE Trans. Parallel Distributed Syst..

[9]  Geoffrey C. Fox,et al.  Parallel Data Mining from Multicore to Cloudy Grids , 2008, High Performance Computing Workshop.