Hybrid Telemetry Fusion for Early Detection of Systemwide Failures

Authors

  • Vijaya Krishna Namala

Keywords:

Telemetry, Fusion, Anomaly, Detection, Distributed, Systems, Failures, Metrics, Logs, Traces, Signals, Monitoring, Prediction, Correlation, Reliability

Abstract

Modern distributed systems generate vast and heterogeneous streams of operational data, including metrics, logs, events, traces, configuration snapshots, and network-level signals. Although each telemetry source provides valuable insights, they are typically analyzed in isolation, resulting in delayed understanding of emerging systemwide failures. As applications scale across clusters, nodes, services, and network domains, failures increasingly manifest as subtle cross-layer interactions rather than isolated component issues. Conventional approaches are therefore limited in their ability to detect failures early, correlate related signals, or capture the causal chain that leads to large-scale degradation. These limitations often result in reactive incident response, increased mean time to detection (MTTD), and an inability to predict systemwide impacts before end-users experience service disruption. This research proposes a Hybrid Telemetry Fusion framework designed to overcome these limitations by integrating diverse observability data into a unified, multi-dimensional representation of system health. Instead of treating telemetry streams independently, the proposed approach fuses metrics, logs, traces, and network signals to construct enriched cross-layer feature sets capable of revealing early indicators of cascading failures. The framework incorporates telemetry alignment, temporal correlation, semantic enrichment, and multi-source feature construction to enable a more holistic understanding of system behavior. The primary objective of this work is to address the current gap in early detection of large-scale failures by enabling the system to observe emerging anomalies that span multiple components, resource types, and operational layers. Specifically, the research aims to resolve the challenge of fragmented observability by creating a fusion-powered detection mechanism that identifies systemwide instability earlier than traditional monitoring techniques. By systematically integrating hybrid telemetry sources, the proposed framework seeks to detect fault propagation patterns, cross-component anomalies, and early warning signals that cannot be captured through single-source analysis. This approach directly targets the core limitation of existing observability systems—their inability to correlate multi-modal signals into a coherent, early indicator of impending systemwide failure.

Downloads

Download data is not yet available.

References

A. Bremler-Barr, & Y. Harchol. Hybrid anomaly detection in large-scale distributed systems. IEEE Transactions on Network and Service Management, 2021

A. Singh, & R. Kapoor. Graph-based approaches for distributed system anomaly detection. Journal of Network and Computer Applications, 2021

A. Ramaswamy, & P. Rao. Scalable monitoring frameworks for containerized systems. Journal of Cloud Computing, 2021

C. Xu, J. Zhou, & X. Chen. Multisource telemetry fusion for cloud-native observability. ACM Computing Surveys, 2022

D. Morgan, & R. Patel. Reliability engineering for distributed systems. ACM SIGOPS Operating Systems Review, 2020

H. Li, & Y. Duan. Telemetry-driven fault correlation in microservices environments. IEEE Transactions on Services Computing, 2021

H. Hassan, & A. Mahmood. Data-driven approaches for detecting system-wide outages. Future Generation Computer Systems, 2021

J. Kim, H. Park, & D. Lee. High-dimensional telemetry modeling for proactive failure diagnosis. IEEE Transactions on Dependable and Secure Computing, 2021

J. Thomas, & R. Abraham. Hybrid sensor fusion models for fault detection in distributed environments. Engineering Applications of Artificial Intelligence, 2021

K. Choi, & S. Yu. Unified telemetry pipelines for anomaly detection in large-scale systems. IEEE Transactions on Network Management, 2020

L. Wang, Q. Li, & Y. Zhang. Deep learning-based anomaly detection in distributed infrastructures. Journal of Parallel and Distributed Computing, 2021

M. Gupta, & V. Rathi. AI-assisted observability for early failure detection. Expert Systems With Applications, 2021

M. Xu, & Z. Lin. Predictive modeling for performance degradation in distributed pipelines. Journal of Systems Architecture, 2020

N. Banerjee, & T. Bose. Lightweight ML techniques for observability enhancement. IEEE Internet Computing, 2020

P. Sharma, & P. Shenoy. Failure-aware resource management in distributed clusters. IEEE Transactions on Cloud Computing, 2020

P. Zhang, & H. Luo. End-to-end monitoring for distributed microservices architectures. IEEE Transactions on Parallel and Distributed Systems, 2021

R. Jain, & S. Paul. Machine learning for system failure prediction. IEEE Communications Surveys & Tutorials, 2020

S. Dutta, & G. Kaur. Fusion-based monitoring architectures for distributed cloud systems. IEEE Access, 2021

S. Park, & J. Kang. Intelligent failure prediction using hybrid ML architectures. Neural Computing & Applications, 2021

S. Banerjee, & M. Chatterjee. Performance anomaly localization in distributed applications. IEEE Transactions on Network Science and Engineering, 2021

Y. He, & Z. Liu. Cross-layer diagnostics for cloud-native infrastructures. IEEE Transactions on Cloud Computing, 2021

Y. Zhou, L. Sun, & T. Wei. Adaptive anomaly localization with multimodal signals. ACM Transactions on Cyber-Physical Systems, 2021

Downloads

Published

31.01.2023

How to Cite

Vijaya Krishna Namala. (2023). Hybrid Telemetry Fusion for Early Detection of Systemwide Failures. International Journal of Intelligent Systems and Applications in Engineering, 11(2s), 428 –. Retrieved from https://ijisae.org/index.php/IJISAE/article/view/8003

Issue

Section

Research Article