Exploring the Limits of Raft's Fault Tolerance: Insights from Simulated Network Partitions
Keywords:
Raft Consensus; Fault Tolerance; Aliveness; Discrete Event Simulation;Abstract
This paper delves into the robustness of the Raft consensus algorithm, particularly focusing on its fault tolerance capabilities and the challenges it faces under network partitions and node failures. This study provides a comprehensive analysis of Raft's mechanisms to ensure data consistency across distributed databases. Through detailed UML diagrams followed by simulations, this work effectively illustrates the leader election algorithmic processes and fault tolerance operations within the Raft. This paper focuses on edge-case failure scenarios and illustrates them with sequence diagrams and complimented by graphs of results from the discrete event simulation of Raft's leader election. Using a custom-built Discrete Event Simulator, we explored the space of Raft's lesser-known failure cases, thus complementing previous studies on consensus mechanisms. This study pushes the limits of Raft's liveness and provides a broader picture for better understandability.
Downloads
References
P. J. Marandi, M. Primi, and F. Pedone, "Multi-ring paxos," IEEEIFIP Int. Conf. Dependable Syst. Netw. DSN 2012, p. 1—12.
D. Ongaro and J. Ousterhout, "In search of an understandable consensus algorithm," 2014 USENIX Annu. Tech. Conf. USENIX ATC 14, p. 305—319.
"etcd," etcd. [Online]. Available: https://etcd.io
"CockroachDB." Accessed: Dec. 22, 2023. [Online]. Available: https://www.cockroachlabs.com/
D. Huang et al., "TiDB: a Raft-based HTAP database," Proc. VLDB Endow., vol. 13, no. 12, pp. 3072–3084, Aug. 2020, doi: 10.14778/3415478.3415535.
C. Gyorodi, R. Gyorodi, G. Pecherle, and A. Olah, "A comparative study: MongoDB vs. MySQL," in 2015 13th International Conference on Engineering of Modern Electric Systems (EMES), Oradea, Romania: IEEE, Jun. 2015, pp. 1–6. doi: 10.1109/EMES.2015.7158433.
V. M. Ionescu, "The analysis of the performance of RabbitMQ and ActiveMQ," in 2015 14th RoEduNet International Conference - Networking in Education and Research (RoEduNet NER), Craiova, Romania: IEEE, Sep. 2015, pp. 132–137. doi: 10.1109/RoEduNet.2015.7311982.
D. Fernandes and J. Bernardino, "Graph Databases Comparison: AllegroGraph, ArangoDB, InfiniteGraph, Neo4J, and OrientDB:," in Proceedings of the 7th International Conference on Data Science, Technology and Applications, Porto, Portugal: SCITEPRESS - Science and Technology Publications, 2018, pp. 373–380. doi: 10.5220/0006910203730380.
Department of Computing and Informatics, Mazoon College, Muscat, Sultanate of Oman., M. Nasar, M. A. Kausar, and Department of Information Systems, University of Nizwa, Nizwa, Sultanate of Oman., "Suitability Of Influxdb Database For Iot Applications," Int. J. Innov. Technol. Explor. Eng., vol. 8, no. 10, pp. 1850–1857, Aug. 2019, doi: 10.35940/ijitee.J9225.0881019.
K. Subramanian, "Introducing the Splunk Platform," in Practical Splunk Search Processing Language, Berkeley, CA: Apress, 2020, pp. 1–38. doi: 10.1007/978-1-4842-6276-4_1.
"RedPanda," RedPanda. Accessed: Jun. 21, 2024. [Online]. Available: https://redpanda.com/
S. Tian, F. Bai, T. Shen, C. Zhang, and B. Gong, "VSSB-Raft: A Secure and Efficient Zero Trust Consensus Algorithm for Blockchain," ACM Trans. Sens. Netw., vol. 20, no. 2, pp. 1–22, Mar. 2024, doi: 10.1145/3611308.
X. Wu, C. Wang, and Z. Liu, "Raft consensus algorithm based on reputation mechanism," in International Conference on Computer Network Security and Software Engineering (CNSSE 2022), SPIE, Oct. 2022, pp. 272–281. doi: 10.1117/12.2640755.
Z. Zhan and R. Huang, "Improvement of Hierarchical Byzantine Fault Tolerance Algorithm in RAFT Consensus Algorithm Election," Appl. Sci., vol. 13, no. 16, p. 9125, Aug. 2023, doi: 10.3390/app13169125.
"Cloudflare etcd raft outage," Cloudflare etcd raft outage. [Online]. Available: https://blog.cloudflare.com/a-byzantine-failure-in-the-real-world/
"A byzantine failure in the real world (Nov 2020).," A byzantine failure in the real world (Nov 2020). [Online]. Available: https://blog.cloudflare.com/a-byzantine-failure-in-the-real-world/
C. Jensen, H. Howard, and R. Mortier, "Examining Raft's behaviour during partial network failures," in Proceedings of the 1st Workshop on High Availability and Observability of Cloud Systems, Online United Kingdom: ACM, Apr. 2021, pp. 11–17. doi: 10.1145/3447851.3458739.
Downloads
Published
How to Cite
Issue
Section
License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
All papers should be submitted electronically. All submitted manuscripts must be original work that is not under submission at another journal or under consideration for publication in another form, such as a monograph or chapter of a book. Authors of submitted papers are obligated not to submit their paper for publication elsewhere until an editorial decision is rendered on their submission. Further, authors of accepted papers are prohibited from publishing the results in other publications that appear before the paper is published in the Journal unless they receive approval for doing so from the Editor-In-Chief.
IJISAE open access articles are licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. This license lets the audience to give appropriate credit, provide a link to the license, and indicate if changes were made and if they remix, transform, or build upon the material, they must distribute contributions under the same license as the original.