ZIA Global Observability Architecture: A Scalable, Fault-Isolated, and GitOps-Driven Approach for Hyper-Distributed Cloud Systems
Keywords:
cardinality governance, cloud-native observability, fault isolation, federated telemetry aggregation, GitOps, Kafka telemetry pipelineAbstract
Modern enterprises operating at hyper-scale face profound observability challenges: telemetry volumes exceeding 100,000 active nodes across 13 or more geographically distributed cloud environments, metric cardinality that strains storage and query infrastructure, and centralized monitoring architectures that introduce single points of failure at precisely the moments when visibility is most critical. This paper presents the Zscaler Internet Access (ZIA) Global Observability Architecture, a novel framework that redefines observability as a distributed, developer-owned, and code-governed system. The architecture introduces four interlocking innovations: a Kafka-backed durable telemetry pipeline achieving zero data loss through 24-to-72-hour message retention; per-cloud and per-domain fault isolation ensuring that failures within one cloud environment do not propagate globally; an observability-as-code model via GitOps placing all dashboards, alerts, and recording rules under version control and continuous integration (CI); and federated metric aggregation through KloudFuse, which receives pre-aggregated signals to eliminate global cardinality explosion. Evaluated across production deployments spanning multiple Amazon Web Services (AWS) and Google Cloud Platform (GCP) regions, the architecture achieves sub-minute telemetry freshness, 100% dashboard standardization, a 5.9-fold improvement in alert fidelity, and a reduction in data loss from 2.3% to 0.0% per month. The ZIA framework provides a replicable engineering blueprint for enterprises seeking to transition from fragmented monitoring stacks to a unified, resilient, and developer-centric observability ecosystem.
Downloads
References
J. Kosinska et al., "Toward the observability of cloud-native applications: the overview of the state-of-the-art," IEEE Access, vol. 11, pp. 73036–73052, 2023. [Online]. Available: https://ieeexplore.ieee.org/document/10141603
M. Usman et al., "A survey on observability of distributed edge & container-based microservices," IEEE Access, vol. 10, pp. 86904–86919, 2022. [Online]. Available: https://ieeexplore.ieee.org/document/9837035
U. Faseeha et al., "Observability in microservices: an in-depth exploration of frameworks, challenges, and deployment paradigms," IEEE Access, vol. 13, pp. 72011–72039, 2025. [Online]. Available: https://ieeexplore.ieee.org/document/10967524
D. Soldani et al., "eBPF: a new approach to cloud-native observability, networking and security for current (5G) and future mobile networks (6G and beyond)," IEEE Access, vol. 11, pp. 57174–57202, 2023. [Online]. Available: https://ieeexplore.ieee.org/document/10138542
A. Sgambelluri et al., "Reliable and scalable Kafka-based framework for optical network telemetry," IEEE/OSA Journal of Optical Communications and Networking, vol. 13, no. 10, pp. E42–E52, 2021. [Online]. Available: https://ieeexplore.ieee.org/document/9447425
F. Beetz and S. Harrer, "GitOps: the evolution of DevOps?" IEEE Software, vol. 39, no. 4, pp. 70–75, 2022. [Online]. Available: https://ieeexplore.ieee.org/document/9565152
T. Pusztai et al., "A novel middleware for efficiently implementing complex cloud-native SLOs," in Proc. IEEE International Conference on Cloud Computing (CLOUD), 2021, pp. 1–10. [Online]. Available: https://ieeexplore.ieee.org/document/9582269
S. Nastic et al., "SLOC: service level objectives for next generation cloud computing," IEEE Internet of Things Journal, vol. 8, no. 2, pp. 801–814, 2021. [Online]. Available: https://ieeexplore.ieee.org/document/9146966
M. Usman et al., "DESK: distributed observability framework for edge-based containerized microservices," in 2023 Joint European Conference on Networks and Communications & 6G Summit (EuCNC/6G Summit), 2023, pp. 1–11. [Online]. Available: https://ieeexplore.ieee.org/document/10188344
R. Vilalta et al., "Optical network telemetry with streaming mechanisms using Transport API and Kafka," 2021 European Conference on Optical Communication (ECOC), 2021. [Online]. Available: https://ieeexplore.ieee.org/document/9606002
Alexander Keller and Jonathan Whitson, "Achieving observability at scale through federated learning," in NOMS 2023-2023 IEEE/IFIP Network Operations and Management Symposium, 2023. [Online]. Available: https://ieeexplore.ieee.org/document/10154346
T. Pusztai et al., "SLO Script: a novel language for implementing complex cloud-native elasticity-driven SLOs," 2021 IEEE International Conference on Web Services (ICWS), 2021. [Online]. Available: https://ieeexplore.ieee.org/document/9590275
S. R. J. Reddy et al., "Efficient application deployment: GitOps for faster and secure CI/CD cycles," 2024 International Conference on Advances in Modern Age Technologies for Health and Engineering Science (AMATHE), 2024. [Online]. Available: https://ieeexplore.ieee.org/document/10582118
S. Kurrewar et al., "Streamlining Kubernetes deployments through GitOps methodologies," 2025 IEEE International Students' Conference on Electrical, Electronics and Computer Science (SCEECS), 2025. [Online]. Available: https://ieeexplore.ieee.org/document/10941164
H. Tong et al., "Performance analysis of time series databases: a comparison in cloud and physical environments," 2024 International Conference on AI x Data and Knowledge Engineering (AIxDKE), 2024. [Online]. Available: https://ieeexplore.ieee.org/document/10990081
P. Matysiak et al., "Cloud native observability for an enhanced orchestration at the telco edge," ICC 2025 - IEEE International Conference on Communications, 2025. [Online]. Available: https://ieeexplore.ieee.org/document/11162045
D. Pathak et al., "Self-adjusting log observability for cloud-native applications," 2024 IEEE 17th International Conference on Cloud Computing (CLOUD), 2024. [Online]. Available: https://ieeexplore.ieee.org/document/10643920
M. Lu et al., "A transnational multi-cloud distributed monitoring data integration system," 2020 IEEE 6th International Conference on Computer and Communications (ICCC), 2021. [Online]. Available: https://ieeexplore.ieee.org/document/9344893
Downloads
Published
How to Cite
Issue
Section
License

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
All papers should be submitted electronically. All submitted manuscripts must be original work that is not under submission at another journal or under consideration for publication in another form, such as a monograph or chapter of a book. Authors of submitted papers are obligated not to submit their paper for publication elsewhere until an editorial decision is rendered on their submission. Further, authors of accepted papers are prohibited from publishing the results in other publications that appear before the paper is published in the Journal unless they receive approval for doing so from the Editor-In-Chief.
IJISAE open access articles are licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. This license lets the audience to give appropriate credit, provide a link to the license, and indicate if changes were made and if they remix, transform, or build upon the material, they must distribute contributions under the same license as the original.


