An Optimal and Distributed Checkpointing and Replication based Fault-tolerant Strategy for Reliable Cloud Computing
Keywords:
Cloud computing, fault-tolerant, optimal checkpointing, replication, virtual machines (VMs).Abstract
In the area of information technology the emerging technology Cloud computing plays a major role. Cloud computing virtualization and its dependency on Internet leads to a variety of failures to happen and hence there is a need for reliability and availability becomes a major issue. To ensure proper reliability and availability of the cloud, an efficient fault tolerance strategy needs to be developed and implemented. Majority of the earlier fault tolerant approaches focused on using only one method for tolerating faults. This paper presents an efficient and effective fault-tolerant strategy to deal with the problem of fault tolerance in the environment of cloud computing. This fault-tolerant strategy depends on optimal and distributed checkpointing and replication scheme for obtaining a reliable cloud platform for carrying out customer requests. Further it determines the best fault tolerance strategy for every selected virtual machine (VM). Simulation experiments are carried out to evaluate the performance of the fault-tolerant strategy. The experiment results show that the proposed fault-tolerant strategy enhances the cloud performance in terms of overheads, throughput, availability and maintenance cost.
Downloads
References
R. Buyya, C. Yeo, S. Venugopal, J. Broberg, and I. Brandic, ‘‘Cloud computing and emerging IT platforms: Vision, hype, and reality for delivering computing as the 5th utility,’’ Future Generat. Comput. Syst., vol. 25, no. 6, pp. 599–616, Jun. 2009.
M. Chen, Y. Ma, J. Song, C. -F. Lai, and B. Hu, ‘‘Smart Clothing: Connecting human with clouds and big data for sustainable health monitoring,’’ Mobile Netw. Appl., vol. 21, no. 5, pp. 825–845, Oct. 2016.
A. Alhosban, K. Hashmi, Z. Malik, and B. Medjahed, ‘‘Self-healing framework for cloud-based services,’’ in Proc. Int. Conf. Comput. Syst. Appl., May 2013, pp. 1–7.
Mohammad Amoon, “Adaptive Framework for Reliable Cloud Computing Environment”, IEEE Access, vol. 4, pp. 9469-9478, Nov. 2016.
M. Armbrust et al., ‘‘Above the clouds: A Berkeley view of cloud computing,’’ Univ. California at Berkeley, Berkeley, CA, USA, Tech. Rep. UCB/EECS-2009-28. [Online]. Available: http:// www.eecs.berkeley.edu/Pubs/TechRpts/2009/EECS-2009-28.pdf.
Ng, Abigail, "Google is back online after users around the world reported a brief outage". CNBC. [Online] Available: https://www.cnbc.com/2022/ 08/09/google-down-outage-reported-by-thousands-users-around-the-world.html. Retrieved 9 Aug. 2022.
K. Bilal et al., ‘‘Trends and challenges in cloud data centers,’’ IEEE Cloud Comput. Mag., vol. 1, no. 1, pp. 10–20, 2014.
Zulfiqar Ahmad, Ali Imran Jehangiri, Nader Mohamed, et al, “Fault Tolerant and Data Oriented Scientific Workflows Management and Scheduling System in Cloud Computing”, in IEEE Access, vol. 10, pp. 77614-77632, 2022.
K. Ganga and S. Karthik, ‘‘A fault tolerent approach in scientific workflow systems based on cloud computing,’’ in Proc. Int. Conf. Pattern Recognit., Informat. Mobile Eng. (PRIME), Feb. 2013, pp. 378–390.
A. U. Rehman , Rui L. Aguiar, et al, “Fault-Tolerance in the Scope of Cloud Computing”, in IEEE Access, vol. 10, pp. 63422-63441, June 2022.
Vahid Mohammadian, Nima Jafari Navimipour, et al, “Fault-Tolerant Load Balancing in Cloud Computing: A Systematic Literature Review”, in IEEE Access, vol. 10, pp. 12714-12731, 2022.
Deepika Saxena and Ashutosh Kumar Singh, “A High Availability Management Model based on VM Significance Ranking and Resource Estimation for Cloud Applications”, IEEE Transactions On Services Computing, vol. 16, Issue 3, pp. 1604-1615, 2023.
H. Hui et al., ‘‘An efficient checkpointing scheme in cloud computing environment,’’ in Proc. 2nd Int. Conf. Comput. Appl., Harbin, China, 2013, pp. 251–254.
Yu Xiang, Hang Liu, Tian Lan, et al, “Optimizing Job Reliability Through Contention-Free, Distributed Checkpoint Scheduling”, Vol. 18, Issue: 2, pp. 2077-2088, 2021.
S. Limam and G. Belalem, ‘‘A migration approach for fault tolerance in cloud computing,’’ Int. J. Grid High Perform. Comput., vol. 6, no. 2, pp. 24–37, Apr./Jun. 2014.
J. Cao, M. Simonin, G. Cooperman, and C. Morin, ‘‘Checkpointing as a service in heterogeneous cloud environments,’’ in Proc. 15th IEEE/ACM Int. Symp. Cluster, Cloud Grid Comput., Shenzhen, China, May 2015, pp. 61–70.
Purushottam Sigdel, Xu Yuan, et al, “Realizing Best Checkpointing Control in Computing Systems”, IEEE Transactions on Parallel and Distributed Systems, Vol. 32, Issue: 2, pp.315-329, 2021.
P. Das and P. M. Khilar, ‘‘VFT: A virtualization and fault tolerance approach for cloud computing,’’ in Proc. IEEE Conf. Inf. Commun. Tech- nol. (ICT), Apr. 2013, pp. 473–478.
S. M. Saranya, T. Srimathi, C. Ramanathan, and T. Venkadesan, ‘‘Enhanced fault tolerance and cost reduction using task replication using spot instances in cloud,’’ Int. J. Innov. Res. Sci., Eng. Technol., vol. 4, no. 6, pp. 12–16, May 2015.
Y. Liu and W. Wei, ‘‘A replication-based mechanism for fault tolerance in mapreduce framework,’’ Math. Problems Eng., vol. 2015, 2015, Art. no. 408921.
Jinwei Liu; Haiying Shen, et al., “A Low-Cost Multi-Failure Resilient Replication Scheme for High-Data Availability in Cloud Storage”, in IEEE/ACM Transactions on Networking, Vol. 29, Issue: 4, pp. 1436-1451, Aug. 2021.
Ahmed Awad, Rashed Salem, “A Novel Intelligent Approach for Dynamic Data Replication in Cloud Environment”, in IEEE Access, vol. 9, pp. 40241-40254, 2021.
Bashir Mohammed, Mariam Kiran, et al., “Failover strategy for fault tolerance in cloud computing environment”, in Wiley Online Library, DOI: 10.1002/spe.2491, 2017. [Online].
E. Bauer and R. Adams. Reliability and Availability of Cloud Computing. Hoboken, NJ, USA: Wiley, 2012.
Y. Wei, J. Qiu, H. Lam, and L. Wu, ‘‘Approaches to T-S fuzzy- affine-model-based reliable output feedback control for nonlinear Ito stochastic systems,’’ IEEE Trans. Fuzzy Syst., to be published, doi: 10.1109/TFUZZ.2016.2566810.
Y. Wei, X. Peng, and J. Qiu, ‘‘Robust and non-fragile static output feedback control for continuous-time semi-Markovian jump systems,’’ Trans. Inst. Meas. Control, vol. 38, no. 9, pp. 1136–1150, 2016.
CloudSim: A Framework for Modeling and Simulation of Cloud Computing Infrastructures and Services. (Apr. 2016). [Online]. Available: http://www.cloudbus.org/cloudsim.
Amit Sundas, Surya Narayan Panda, “An Introduction of CloudSim Simulation tool for Modelling and Scheduling”, 2020 International Conference on Emerging Smart Computing and Informatics (ESCI), [Online]. DOI: 10.1109/ESCI48226.2020.9167549.
Downloads
Published
How to Cite
Issue
Section
License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
All papers should be submitted electronically. All submitted manuscripts must be original work that is not under submission at another journal or under consideration for publication in another form, such as a monograph or chapter of a book. Authors of submitted papers are obligated not to submit their paper for publication elsewhere until an editorial decision is rendered on their submission. Further, authors of accepted papers are prohibited from publishing the results in other publications that appear before the paper is published in the Journal unless they receive approval for doing so from the Editor-In-Chief.
IJISAE open access articles are licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. This license lets the audience to give appropriate credit, provide a link to the license, and indicate if changes were made and if they remix, transform, or build upon the material, they must distribute contributions under the same license as the original.