Exploring the Limits of Raft's Fault Tolerance: Insights from Simulated Network Partitions

Authors

  • Kiran Kumar Kondru, Saranya Rajiakodi

Keywords:

Raft Consensus; Fault Tolerance; Aliveness; Discrete Event Simulation;

Abstract

This paper delves into the robustness of the Raft consensus algorithm, particularly focusing on its fault tolerance capabilities and the challenges it faces under network partitions and node failures. This study provides a comprehensive analysis of Raft's mechanisms to ensure data consistency across distributed databases. Through detailed UML diagrams followed by simulations, this work effectively illustrates the leader election algorithmic processes and fault tolerance operations within the Raft. This paper focuses on edge-case failure scenarios and illustrates them with sequence diagrams and complimented by graphs of results from the discrete event simulation of Raft's leader election. Using a custom-built Discrete Event Simulator, we explored the space of Raft's lesser-known failure cases, thus complementing previous studies on consensus mechanisms. This study pushes the limits of Raft's liveness and provides a broader picture for better understandability.

Downloads

Download data is not yet available.

References

P. J. Marandi, M. Primi, and F. Pedone, "Multi-ring paxos," IEEEIFIP Int. Conf. Dependable Syst. Netw. DSN 2012, p. 1—12.

D. Ongaro and J. Ousterhout, "In search of an understandable consensus algorithm," 2014 USENIX Annu. Tech. Conf. USENIX ATC 14, p. 305—319.

"etcd," etcd. [Online]. Available: https://etcd.io

"CockroachDB." Accessed: Dec. 22, 2023. [Online]. Available: https://www.cockroachlabs.com/

D. Huang et al., "TiDB: a Raft-based HTAP database," Proc. VLDB Endow., vol. 13, no. 12, pp. 3072–3084, Aug. 2020, doi: 10.14778/3415478.3415535.

C. Gyorodi, R. Gyorodi, G. Pecherle, and A. Olah, "A comparative study: MongoDB vs. MySQL," in 2015 13th International Conference on Engineering of Modern Electric Systems (EMES), Oradea, Romania: IEEE, Jun. 2015, pp. 1–6. doi: 10.1109/EMES.2015.7158433.

V. M. Ionescu, "The analysis of the performance of RabbitMQ and ActiveMQ," in 2015 14th RoEduNet International Conference - Networking in Education and Research (RoEduNet NER), Craiova, Romania: IEEE, Sep. 2015, pp. 132–137. doi: 10.1109/RoEduNet.2015.7311982.

D. Fernandes and J. Bernardino, "Graph Databases Comparison: AllegroGraph, ArangoDB, InfiniteGraph, Neo4J, and OrientDB:," in Proceedings of the 7th International Conference on Data Science, Technology and Applications, Porto, Portugal: SCITEPRESS - Science and Technology Publications, 2018, pp. 373–380. doi: 10.5220/0006910203730380.

Department of Computing and Informatics, Mazoon College, Muscat, Sultanate of Oman., M. Nasar, M. A. Kausar, and Department of Information Systems, University of Nizwa, Nizwa, Sultanate of Oman., "Suitability Of Influxdb Database For Iot Applications," Int. J. Innov. Technol. Explor. Eng., vol. 8, no. 10, pp. 1850–1857, Aug. 2019, doi: 10.35940/ijitee.J9225.0881019.

K. Subramanian, "Introducing the Splunk Platform," in Practical Splunk Search Processing Language, Berkeley, CA: Apress, 2020, pp. 1–38. doi: 10.1007/978-1-4842-6276-4_1.

"RedPanda," RedPanda. Accessed: Jun. 21, 2024. [Online]. Available: https://redpanda.com/

S. Tian, F. Bai, T. Shen, C. Zhang, and B. Gong, "VSSB-Raft: A Secure and Efficient Zero Trust Consensus Algorithm for Blockchain," ACM Trans. Sens. Netw., vol. 20, no. 2, pp. 1–22, Mar. 2024, doi: 10.1145/3611308.

X. Wu, C. Wang, and Z. Liu, "Raft consensus algorithm based on reputation mechanism," in International Conference on Computer Network Security and Software Engineering (CNSSE 2022), SPIE, Oct. 2022, pp. 272–281. doi: 10.1117/12.2640755.

Z. Zhan and R. Huang, "Improvement of Hierarchical Byzantine Fault Tolerance Algorithm in RAFT Consensus Algorithm Election," Appl. Sci., vol. 13, no. 16, p. 9125, Aug. 2023, doi: 10.3390/app13169125.

"Cloudflare etcd raft outage," Cloudflare etcd raft outage. [Online]. Available: https://blog.cloudflare.com/a-byzantine-failure-in-the-real-world/

"A byzantine failure in the real world (Nov 2020).," A byzantine failure in the real world (Nov 2020). [Online]. Available: https://blog.cloudflare.com/a-byzantine-failure-in-the-real-world/

C. Jensen, H. Howard, and R. Mortier, "Examining Raft's behaviour during partial network failures," in Proceedings of the 1st Workshop on High Availability and Observability of Cloud Systems, Online United Kingdom: ACM, Apr. 2021, pp. 11–17. doi: 10.1145/3447851.3458739.

Downloads

Published

09.07.2024

How to Cite

Kiran Kumar Kondru. (2024). Exploring the Limits of Raft’s Fault Tolerance: Insights from Simulated Network Partitions. International Journal of Intelligent Systems and Applications in Engineering, 12(22s), 838 –. Retrieved from https://ijisae.org/index.php/IJISAE/article/view/6564

Issue

Section

Research Article