Self-Healing CI/CD Pipelines with Feedback-Loop Automation: Building Fault-Tolerant CI/CD Systems Using Anomaly Detection and Automated Rollback Logic

Authors

  • Alekhya Challa, Mahesh Reddy Konatham

Keywords:

autonomous, CI/CD, MTTR, resilience, comprehensive

Abstract

Modern continuous integration and delivery (CI/CD) pipelines are crucial for rapid software releases, yet they risk introducing failures into production. This paper presents a comprehensive study of self-healing CI/CD pipelines that incorporate feedback-loop automation to achieve fault tolerance. We detail an architecture that integrates real-time anomaly detection (including machine learning-based techniques) and automated rollback mechanisms into popular CI/CD platforms (e.g. TeamCity, GitHub Actions, Jenkins). The goal is to minimize production downtime and human intervention by enabling the pipeline to detect issues and revert to stable states autonomously. Grounded in large-scale industry deployments, our approach leverages continuous monitoring and intelligent decision-making to reduce mean time to recovery (MTTR) and improve reliability. We implement the system in a prototype and evaluate it with experiments simulating deployment anomalies. Results show significantly faster failure detection and recovery. MTTR improved by over 50%  as well as high anomaly detection accuracy and efficient rollbacks. We discuss design trade-offs, such as balancing false positives in detection versus safety, and highlight how feedback loops can continuously improve pipeline resilience. The findings demonstrate that self-healing CI/CD pipelines can substantially enhance scalability and reliability of software delivery while minimizing manual oversight, paving the way for more autonomous DevOps processes.

Downloads

Download data is not yet available.

References

Atzberger D (2023) Detecting Outliers in CI/CD Pipeline Logs using LDA, ENASE 2023 - NLP approach for log anomaly detection

Capizzi A (2020) Anomaly Detection in DevOps Toolchain, SEAA - Staging environment anomaly detection to prevent production issues

Fawzy AH (2023) Framework for automatic detection of anomalies in DevOps. King

Saud Univ 2023 - Achieved 96% detection accuracy with ML

Gerber D (2024) Unsupervised Anomaly Detection in Continuous Integration Pipelines, ENASE 2024 - ML-based detection of performance issues in CI

Hrusto A (2022) Optimization of anomaly detection in a microservice system through continuous feedback. IEEE/ACM Workshop

Harness, Io (2023) Continuous Delivery Platform, 2023 - Features AI/ML engine for continuous verification and automatic rollback

(2023) Boosting CI/CD Effectiveness with RedHat and Coralogix, On using Tekton pipelines with feedback loops and automatic rollbacks

(2023) New Relic, GitHub Actions Deployment Protection Rules with AIOps, Anomaly detection gates to prevent bad deployments

Tech BN (2018) Automated Canary Analysis at Netflix with Kayenta, 2018 Describes Netflix’s canary and rollback process

Emily D (2023). Transforming CI/CD deployment pipelines with emerging AI techniques

Downloads

Published

08.12.2024

How to Cite

Alekhya Challa. (2024). Self-Healing CI/CD Pipelines with Feedback-Loop Automation: Building Fault-Tolerant CI/CD Systems Using Anomaly Detection and Automated Rollback Logic. International Journal of Intelligent Systems and Applications in Engineering, 12(23s), 3217 –. Retrieved from https://ijisae.org/index.php/IJISAE/article/view/7651

Issue

Section

Research Article