High-availability clusters: A taxonomy, survey, and future directions

[1]  M. Colajanni,et al.  Scalable, Confidential and Survivable Software Updates , 2022, IEEE Transactions on Parallel and Distributed Systems.

[2]  Sukhpal Singh Gill,et al.  Fog computing: A taxonomy, systematic review, current trends and research challenges , 2021, J. Parallel Distributed Comput..

[3]  Sharon Bassan A Shared Responsibility Model , 2021 .

[4]  Bruno Nogueira,et al.  Cloud storage availability and performance assessment: a study based on NoSQL DBMS , 2021, J. Supercomput..

[5]  Ricardo Bianchini,et al.  Flex: High-Availability Datacenters With Zero Reserved Power , 2021, 2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA).

[6]  Neeraj Kumar,et al.  Storage as a service in Fog computing : A systematic review , 2021, J. Syst. Archit..

[7]  Eduardo K. Viegas,et al.  A Machine Learning Model for Detection of Docker-based APP Overbooking on Kubernetes , 2021, ICC 2021 - IEEE International Conference on Communications.

[8]  José Maria Monteiro,et al.  Main Memory Database Recovery , 2021, ACM Comput. Surv..

[9]  Deron Liang,et al.  High-Availability Computing Platform with Sensor Fault Resilience , 2021, Sensors.

[10]  Samuel Madden,et al.  Epoch-based Commit and Replication in Distributed OLTP Databases , 2021, Proc. VLDB Endow..

[11]  J. E. Weber The Load-In , 2020 .

[12]  Nikos Parlavantzas,et al.  Active-Standby for High-Availability in FaaS , 2020, WOSC@Middleware.

[13]  Yueming Lu,et al.  Distributed File System Multilevel Fault-Tolerant High Availability Mechanism , 2020, CIAT.

[14]  Younghan Kim,et al.  Dynamic Policy Management System for High Availability in a Multi-site Cloud , 2020, 2020 International Conference on Information and Communication Technology Convergence (ICTC).

[15]  Pablo Fondo-Ferreiro,et al.  A Software-Defined Networking Solution for Transparent Session and Service Continuity in Dynamic Multi-Access Edge Computing , 2020, IEEE Transactions on Network and Service Management.

[16]  Younghan Kim,et al.  Design and Implementation of Fast Fault Detection in Cloud Infrastructure for Containerized IoT Services , 2020, Sensors.

[17]  Nuno Preguiça,et al.  Practical client-side replication , 2020, VLDB 2020.

[18]  Jinlong Lin,et al.  A SCSI3 Persistent Reservation Synchronization Solution for iSCSI Targets Cluster Hosting Ceph RBD with Active/Active Connections , 2020, 2020 IEEE 4th Information Technology, Networking, Electronic and Automation Control Conference (ITNEC).

[19]  Zepeng Wen,et al.  Design and Implementation of High-availability PaaS Platform Based on Virtualization Platform , 2020, 2020 IEEE 5th Information Technology and Mechatronics Engineering Conference (ITOEC).

[20]  Miroslaw Malek,et al.  Impact of Failure Prediction on Availability: Modeling and Comparative Analysis of Predictive and Reactive Methods , 2020, IEEE Transactions on Dependable and Secure Computing.

[21]  Eyal de Lara,et al.  SessionStore: A Session-Aware Datastore for the Edge , 2020, 2020 IEEE 4th International Conference on Fog and Edge Computing (ICFEC).

[22]  Glauco Estácio Gonçalves,et al.  Availability analysis of design configurations to compose virtual performance‐optimized data center systems in next‐generation cloud data centers , 2020, Softw. Pract. Exp..

[23]  Vilém Pechanec,et al.  Evaluation of Replication Mechanisms on Selected Database Systems , 2020, ISPRS Int. J. Geo Inf..

[24]  Jameleddine Hassine,et al.  Automatic retrieval and analysis of high availability scenarios from system execution traces: A case study on hot standby router protocol , 2020, J. Syst. Softw..

[25]  Lisandro Zambenedetti Granville,et al.  FT-Aurora: A highly available IaaS cloud manager based on replication , 2020, Comput. Networks.

[26]  Leila Abdollahi Vayghan,et al.  Microservice Based Architecture: Towards High-Availability for Stateful Applications with Kubernetes , 2019, 2019 IEEE 19th International Conference on Software Quality, Reliability and Security (QRS).

[27]  J. Kelner,et al.  Resource allocation based on redundancy models for high availability cloud , 2019, Computing.

[28]  Anjali Agarwal,et al.  High Availability Management for Applications Services in the Cloud Container-Based Platform , 2018, 2018 IEEE/ACS 15th International Conference on Computer Systems and Applications (AICCSA).

[29]  Jason P. Jue,et al.  All One Needs to Know about Fog Computing and Related Edge Computing Paradigms , 2019 .

[30]  Rajkumar Buyya,et al.  Data Storage Management in Cloud Environments , 2017, ACM Comput. Surv..

[31]  Wook-Shin Han,et al.  Parallel Replication across Formats in SAP HANA for Scaling Out Mixed OLTP/OLAP Workloads , 2017, Proc. VLDB Endow..

[32]  Judith Kelner,et al.  Analyzing the IT subsystem failure impact on availability of cloud services , 2017, 2017 IEEE Symposium on Computers and Communications (ISCC).

[33]  Maria Toeroe,et al.  Comparing Pacemaker with OpenSAF for Availability Management in the Cloud , 2017, 2017 IEEE International Conference on Edge Computing (EDGE).

[34]  Ignacio Martín Llorente,et al.  A High-Availability Cloud for Research Computing , 2017, Computer.

[35]  Madhusudhan Govindaraju,et al.  Electron: Towards Efficient Resource Management on Heterogeneous Clusters with Apache Mesos , 2017, 2017 IEEE 10th International Conference on Cloud Computing (CLOUD).

[36]  Danny Weyns,et al.  Engineering Trustworthy Self-Adaptive Software with Dynamic Assurance Cases , 2017, IEEE Transactions on Software Engineering.

[37]  J. Kelner,et al.  High availability in clouds: systematic review and research challenges , 2016, Journal of Cloud Computing.

[38]  Bin Sheng,et al.  High-availability deployment for large enterprises , 2016, 2016 International Conference on Progress in Informatics and Computing (PIC).

[39]  John R. Vacca,et al.  Cloud Computing Security Foundations and Challenges , 2018 .

[40]  Fernando Pedone,et al.  Dynamic Scalable State Machine Replication , 2016, 2016 46th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN).

[41]  Jianming Yong,et al.  A High Availability Application Service Platform for nuclear power enterprises , 2016, 2016 IEEE 20th International Conference on Computer Supported Cooperative Work in Design (CSCWD).

[42]  Dong Seong Kim,et al.  Availability modeling and analysis of a data center for disaster tolerance , 2016, Future Gener. Comput. Syst..

[43]  Claus Pahl,et al.  A Database-Specific Pattern for Multi-cloud High Availability and Disaster Recovery , 2015, ESOCC Workshops.

[44]  Abdelouahed Gherbi,et al.  Leveraging Linux Containers to Achieve High Availability for Cloud Services , 2015, 2015 IEEE International Conference on Cloud Engineering.

[45]  Hailong Sun,et al.  On the tradeoff of availability and consistency for quorum systems in data center networks , 2015, Comput. Networks.

[46]  Adam Barker,et al.  Observing the clouds: a survey and taxonomy of cloud monitoring , 2014, Journal of Cloud Computing.

[47]  Terry Critchley,et al.  High Availability IT Services , 2014 .

[48]  F. Khendek,et al.  Comparing redundancy models for high availability middleware , 2014, Computing.

[49]  Yonggang Wen,et al.  Toward Scalable Systems for Big Data Analytics: A Technology Tutorial , 2014, IEEE Access.

[50]  Cees T. A. M. de Laat,et al.  Defining architecture components of the Big Data Ecosystem , 2014, 2014 International Conference on Collaboration Technologies and Systems (CTS).

[51]  A. Gómez,et al.  Fault-tolerant virtual cluster experiments on federated sites using BonFIRE , 2014, Future Gener. Comput. Syst..

[52]  Gregory Levitin,et al.  Cold vs. hot standby mission operation cost minimization for 1-out-of-N systems , 2014, Eur. J. Oper. Res..

[53]  Mário M. Freire,et al.  Security issues in cloud environments: a survey , 2014, International Journal of Information Security.

[54]  Bu-Sung Lee,et al.  Software defined network based adaptive routing for data replication in Data Centers , 2013, 2013 19th IEEE International Conference on Networks (ICON).

[55]  Matjaz B. Juric,et al.  Towards a unified taxonomy and architecture of cloud frameworks , 2013, Future Gener. Comput. Syst..

[56]  Yves Lemieux,et al.  Achieving High Availability at the Application Level in the Cloud , 2013, 2013 IEEE Sixth International Conference on Cloud Computing.

[57]  Andrew Warfield,et al.  RemusDB: transparent high availability for database systems , 2011, The VLDB Journal.

[58]  P. R. Anisha,et al.  Controlling of Data in the Cloud , 2012 .

[59]  Kevin P. Murphy,et al.  Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.

[60]  Maria Toeroe,et al.  Service Availability: Principles and Practice , 2012 .

[61]  Ulrik Franke,et al.  Optimal IT Service Availability: Shorter Outages, or Fewer? , 2012, IEEE Transactions on Network and Service Management.

[62]  Kenneth P. Birman,et al.  Guide to Reliable Distributed Systems , 2012, Texts in Computer Science.

[63]  Fang Liu,et al.  NIST Cloud Computing Reference Architecture , 2011, 2011 IEEE World Congress on Services.

[64]  Francesco Longo,et al.  Availability Assessment of HA Standby Redundant Clusters , 2010, 2010 29th IEEE Symposium on Reliable Distributed Systems.

[65]  Xin Chen,et al.  Symmetric active/active metadata service for high availability parallel file systems , 2009, J. Parallel Distributed Comput..

[66]  Rafael M. Gasca,et al.  Demystifying Cluster-Based Fault-Tolerant Firewalls , 2009, IEEE Internet Computing.

[67]  Carlo Vercellis,et al.  Business Intelligence: Data Mining and Optimization for Decision Making , 2009 .

[68]  Haixun Wang,et al.  Online Anomaly Prediction for Robust Cluster Systems , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[69]  Tiziana Margaria,et al.  High Service Availability in MaTRICS for the OCS , 2008, ISoLA.

[70]  Eli M. Dow,et al.  Leveraging virtualization to optimize high-availability system configurations , 2008, IBM Syst. J..

[71]  Jürgen M. Schneider,et al.  From high availability and disaster recovery to business continuity solutions , 2008, IBM Syst. J..

[72]  Dutch T. Meyer,et al.  Remus: High Availability via Asynchronous Virtual Machine Replication. (Best Paper) , 2008, NSDI.

[73]  Yong-Ju Lee,et al.  A Stochastic Availability Prediction Model for Head Nodes in the HA Cluster , 2008, 22nd International Conference on Advanced Information Networking and Applications - Workshops (aina workshops 2008).

[74]  Susan Snedaker,et al.  Business Continuity and Disaster Recovery Planning for IT Professionals , 2007 .

[75]  Chang-Sheng Xie,et al.  High Availability Cluster with Combining NAS and ISCSI , 2006, 2006 International Conference on Machine Learning and Cybernetics.

[76]  Xubin He,et al.  Symmetric Active/Active High Availability for High-Performance Computing System Services: Accomplishments and Limitations , 2006, 2008 Eighth IEEE International Symposium on Cluster Computing and the Grid (CCGRID).

[77]  Davide Rossi,et al.  Analyzing the Impact of Components Replication in High Available J2EE Clusters , 2005, Joint International Conference on Autonomic and Autonomous Systems and International Conference on Networking and Services - (icas-isns'05).

[78]  Haw Ching Yang,et al.  Application Cluster Service Scheme for Near-Zero-Downtime Services , 2005, Proceedings of the 2005 IEEE International Conference on Robotics and Automation.

[79]  Hao Wang,et al.  Architectural design and implementation of highly available and scalable medical system with IBM Websphere middleware , 2004, Proceedings. 17th IEEE Symposium on Computer-Based Medical Systems.

[80]  Robbert van Renesse,et al.  Adding high availability and autonomic behavior to Web services , 2004, Proceedings. 26th International Conference on Software Engineering.

[81]  Stephen L. Scott,et al.  HA-OSCAR: the birth of highly available OSCAR , 2003 .

[82]  A. Schiper,et al.  Total order broadcast and multicast algorithms: Taxonomy and survey , 2003, CSUR.

[83]  Rachid Guerraoui,et al.  The Database State Machine Approach , 2003, Distributed and Parallel Databases.

[84]  Jorge L. Romeu,et al.  Practical Reliability Engineering , 2003, Technometrics.

[85]  Jim Noble,et al.  Check Point NG VPN-1/Firewall-1: Advanced Configuration and Troubleshooting , 2003 .

[86]  Vladimir Getov,et al.  Intelligent architecture for automatic resource allocation in computer clusters , 2003, Proceedings International Parallel and Distributed Processing Symposium.

[87]  Xiaoshe Dong,et al.  Design and implementation of heartbeat in multi-machine environment , 2003, 17th International Conference on Advanced Information Networking and Applications, 2003. AINA 2003..

[88]  Dilip M. Ranade,et al.  Shared Data Clusters: Scaleable, Manageable, and Highly Available Systems (VERITAS Series) , 2002 .

[89]  Tim Burke,et al.  A high-availability clustering architecture with data integrity guarantees , 2001, Proceedings 42nd IEEE Symposium on Foundations of Computer Science.

[90]  Matthew T. O'Keefe,et al.  Scalability and Failure Recovery in a Linux Cluster File System , 2000, Annual Linux Showcase & Conference.

[91]  Rajkumar Buyya,et al.  High Performance Cluster Computing: Architectures and Systems , 1999 .

[92]  Kenneth P. Birman,et al.  The design and architecture of the Microsoft Cluster Service-a practical approach to high-availability and scalability , 1998, Digest of Papers. Twenty-Eighth Annual International Symposium on Fault-Tolerant Computing (Cat. No.98CB36224).

[93]  Michael K. Reiter,et al.  Probabilistic quorum systems , 1997, PODC '97.

[94]  Danny Dolev,et al.  The Transis approach to high availability cluster communication , 1996, CACM.

[95]  Moni Naor,et al.  The load, capacity and availability of quorum systems , 1994, Proceedings 35th Annual Symposium on Foundations of Computer Science.

[96]  P. A. Carter High Availability and Disaster Recovery Concepts , 2020, SQL Server 2019 AlwaysOn.

[97]  Nuno M. Preguiça,et al.  Practical Client-side Replication: Weak Consistency Semantics for Insecure Settings , 2020, Proc. VLDB Endow..

[98]  Hemant Saxena,et al.  A Cloud-native Architecture for Replicated Data Services , 2020, HotCloud.

[99]  Xiaobo Zhou,et al.  Edge Computing in Industrial Internet of Things: Architecture, Advances and Challenges , 2020, IEEE Communications Surveys & Tutorials.

[100]  Premathas Somasekaram A Component-based Business Continuity and Disaster Recovery Framework , 2017 .

[101]  Maria Toeroe,et al.  Availability in the cloud: State of the art , 2016, J. Netw. Comput. Appl..

[102]  Mitsuhiro Iriya Clustering Solutions for Achieving High Availability for Diversifying Platforms : The Future in Advanced Best Practices Sponsored by : NEC , 2016 .

[103]  Nick Heudecker,et al.  Magic Quadrant for Operational Database Management Systems , 2015 .

[104]  Oracle Solaris and Oracle Solaris Cluster : Extending Oracle Solaris for Business Continuity , 2013 .

[105]  Luiz André Barroso,et al.  The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines , 2009, The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines.

[106]  Christian Engelmann,et al.  JOSHUA: Symmetric Active/Active Replication for Highly Available HPC Job and Resource Management , 2006, 2006 IEEE International Conference on Cluster Computing.

[107]  C. Leangsuksun,et al.  Asymmetric Active-Active High Availability for High-end Computing , 2005 .

[108]  Jon Paul Maloy,et al.  TIPC: Providing Communication for Linux Clusters , 2004 .

[109]  S. Scott,et al.  A Failure Predictive and Policy-Based High Availability Strategy for Linux High Performance Computing Cluster , 2004 .

[110]  Carl E. Landwehr,et al.  Basic concepts and taxonomy of dependable and secure computing , 2004, IEEE Transactions on Dependable and Secure Computing.

[111]  Tong Liu,et al.  Availability prediction and modeling of high mobility OSCAR cluster , 2003, 2003 Proceedings IEEE International Conference on Cluster Computing.

[112]  Marcel Waldvogel,et al.  Efficient topology-aware overlay network , 2003, CCRV.

[113]  Evan Marcus,et al.  Blueprints for high availability , 2000 .

[114]  A. Singh,et al.  Fault-tolerant systems , 1990, Computer.