An Optimal and Distributed Checkpointing and Replication based Fault-tolerant Strategy for Reliable Cloud Computing

Authors

  • M. Damodhar, Ch. D. V. Subba Rao

Keywords:

Cloud computing, fault-tolerant, optimal checkpointing, replication, virtual machines (VMs).

Abstract

In the area of information technology the emerging technology Cloud computing plays a major role. Cloud computing virtualization and its dependency on Internet leads to a variety of failures to happen and hence there is a need for reliability and availability becomes a major issue. To ensure proper reliability and availability of the cloud, an efficient fault tolerance strategy needs to be developed and implemented. Majority of the earlier fault tolerant approaches focused on using only one method for tolerating faults. This paper presents an efficient and effective fault-tolerant strategy to deal with the problem of fault tolerance in the environment of cloud computing. This fault-tolerant strategy depends on optimal and distributed checkpointing and replication scheme for obtaining a reliable cloud platform for carrying out customer requests. Further it determines the best fault tolerance strategy for every selected virtual machine (VM). Simulation experiments are carried out to evaluate the performance of the fault-tolerant strategy. The experiment results show that the proposed fault-tolerant strategy enhances the cloud performance in terms of overheads, throughput, availability and maintenance cost.

Downloads

Download data is not yet available.

References

R. Buyya, C. Yeo, S. Venugopal, J. Broberg, and I. Brandic, ‘‘Cloud computing and emerging IT platforms: Vision, hype, and reality for delivering computing as the 5th utility,’’ Future Generat. Comput. Syst., vol. 25, no. 6, pp. 599–616, Jun. 2009.

M. Chen, Y. Ma, J. Song, C. -F. Lai, and B. Hu, ‘‘Smart Clothing: Connecting human with clouds and big data for sustainable health monitoring,’’ Mobile Netw. Appl., vol. 21, no. 5, pp. 825–845, Oct. 2016.

A. Alhosban, K. Hashmi, Z. Malik, and B. Medjahed, ‘‘Self-healing framework for cloud-based services,’’ in Proc. Int. Conf. Comput. Syst. Appl., May 2013, pp. 1–7.

Mohammad Amoon, “Adaptive Framework for Reliable Cloud Computing Environment”, IEEE Access, vol. 4, pp. 9469-9478, Nov. 2016.

M. Armbrust et al., ‘‘Above the clouds: A Berkeley view of cloud computing,’’ Univ. California at Berkeley, Berkeley, CA, USA, Tech. Rep. UCB/EECS-2009-28. [Online]. Available: http:// www.eecs.berkeley.edu/Pubs/TechRpts/2009/EECS-2009-28.pdf.

Ng, Abigail, "Google is back online after users around the world reported a brief outage". CNBC. [Online] Available: https://www.cnbc.com/2022/ 08/09/google-down-outage-reported-by-thousands-users-around-the-world.html. Retrieved 9 Aug. 2022.

K. Bilal et al., ‘‘Trends and challenges in cloud data centers,’’ IEEE Cloud Comput. Mag., vol. 1, no. 1, pp. 10–20, 2014.

Zulfiqar Ahmad, Ali Imran Jehangiri, Nader Mohamed, et al, “Fault Tolerant and Data Oriented Scientific Workflows Management and Scheduling System in Cloud Computing”, in IEEE Access, vol. 10, pp. 77614-77632, 2022.

K. Ganga and S. Karthik, ‘‘A fault tolerent approach in scientific workflow systems based on cloud computing,’’ in Proc. Int. Conf. Pattern Recognit., Informat. Mobile Eng. (PRIME), Feb. 2013, pp. 378–390.

A. U. Rehman , Rui L. Aguiar, et al, “Fault-Tolerance in the Scope of Cloud Computing”, in IEEE Access, vol. 10, pp. 63422-63441, June 2022.

Vahid Mohammadian, Nima Jafari Navimipour, et al, “Fault-Tolerant Load Balancing in Cloud Computing: A Systematic Literature Review”, in IEEE Access, vol. 10, pp. 12714-12731, 2022.

Deepika Saxena and Ashutosh Kumar Singh, “A High Availability Management Model based on VM Significance Ranking and Resource Estimation for Cloud Applications”, IEEE Transactions On Services Computing, vol. 16, Issue 3, pp. 1604-1615, 2023.

H. Hui et al., ‘‘An efficient checkpointing scheme in cloud computing environment,’’ in Proc. 2nd Int. Conf. Comput. Appl., Harbin, China, 2013, pp. 251–254.

Yu Xiang, Hang Liu, Tian Lan, et al, “Optimizing Job Reliability Through Contention-Free, Distributed Checkpoint Scheduling”, Vol. 18, Issue: 2, pp. 2077-2088, 2021.

S. Limam and G. Belalem, ‘‘A migration approach for fault tolerance in cloud computing,’’ Int. J. Grid High Perform. Comput., vol. 6, no. 2, pp. 24–37, Apr./Jun. 2014.

J. Cao, M. Simonin, G. Cooperman, and C. Morin, ‘‘Checkpointing as a service in heterogeneous cloud environments,’’ in Proc. 15th IEEE/ACM Int. Symp. Cluster, Cloud Grid Comput., Shenzhen, China, May 2015, pp. 61–70.

Purushottam Sigdel, Xu Yuan, et al, “Realizing Best Checkpointing Control in Computing Systems”, IEEE Transactions on Parallel and Distributed Systems, Vol. 32, Issue: 2, pp.315-329, 2021.

P. Das and P. M. Khilar, ‘‘VFT: A virtualization and fault tolerance approach for cloud computing,’’ in Proc. IEEE Conf. Inf. Commun. Tech- nol. (ICT), Apr. 2013, pp. 473–478.

S. M. Saranya, T. Srimathi, C. Ramanathan, and T. Venkadesan, ‘‘Enhanced fault tolerance and cost reduction using task replication using spot instances in cloud,’’ Int. J. Innov. Res. Sci., Eng. Technol., vol. 4, no. 6, pp. 12–16, May 2015.

Y. Liu and W. Wei, ‘‘A replication-based mechanism for fault tolerance in mapreduce framework,’’ Math. Problems Eng., vol. 2015, 2015, Art. no. 408921.

Jinwei Liu; Haiying Shen, et al., “A Low-Cost Multi-Failure Resilient Replication Scheme for High-Data Availability in Cloud Storage”, in IEEE/ACM Transactions on Networking, Vol. 29, Issue: 4, pp. 1436-1451, Aug. 2021.

Ahmed Awad, Rashed Salem, “A Novel Intelligent Approach for Dynamic Data Replication in Cloud Environment”, in IEEE Access, vol. 9, pp. 40241-40254, 2021.

Bashir Mohammed, Mariam Kiran, et al., “Failover strategy for fault tolerance in cloud computing environment”, in Wiley Online Library, DOI: 10.1002/spe.2491, 2017. [Online].

E. Bauer and R. Adams. Reliability and Availability of Cloud Computing. Hoboken, NJ, USA: Wiley, 2012.

Y. Wei, J. Qiu, H. Lam, and L. Wu, ‘‘Approaches to T-S fuzzy- affine-model-based reliable output feedback control for nonlinear Ito stochastic systems,’’ IEEE Trans. Fuzzy Syst., to be published, doi: 10.1109/TFUZZ.2016.2566810.

Y. Wei, X. Peng, and J. Qiu, ‘‘Robust and non-fragile static output feedback control for continuous-time semi-Markovian jump systems,’’ Trans. Inst. Meas. Control, vol. 38, no. 9, pp. 1136–1150, 2016.

CloudSim: A Framework for Modeling and Simulation of Cloud Computing Infrastructures and Services. (Apr. 2016). [Online]. Available: http://www.cloudbus.org/cloudsim.

Amit Sundas, Surya Narayan Panda, “An Introduction of CloudSim Simulation tool for Modelling and Scheduling”, 2020 International Conference on Emerging Smart Computing and Informatics (ESCI), [Online]. DOI: 10.1109/ESCI48226.2020.9167549.

Downloads

Published

06.08.2024

How to Cite

M. Damodhar. (2024). An Optimal and Distributed Checkpointing and Replication based Fault-tolerant Strategy for Reliable Cloud Computing. International Journal of Intelligent Systems and Applications in Engineering, 12(23s), 683 –. Retrieved from https://ijisae.org/index.php/IJISAE/article/view/6962

Issue

Section

Research Article