Contrastive Failure Embeddings for Lakehouse Reliability Engineering

Mogana Kumaran Sivaraman

Authors

Mogana Kumaran Sivaraman

Keywords:

contrastive learning, failure embeddings, lakehouse reliability, incident retrieval

Abstract

This study proposes a contrastive embedding approach for reliability engineering in lakehouse environments. The objective is to learn a vector representation of failure episodes that captures operational similarity, enabling recurrence detection, nearest-neighbor retrieval, and similarity-based analysis of historical incidents. Failure records are constructed from normalized telemetry features reflecting execution behavior, resource usage, runtime outcomes, and contextual metadata. A neural encoder is trained using a supervised contrastive loss with temperature-controlled scaling, pulling same-class failure pairs together while pushing apart pairs from different classes to produce geometrically well-separated clusters suited to cosine-similarity retrieval. Because the encoder maps failures into a continuous space rather than assigning discrete class labels, new failure types can be accommodated without retraining: a novel failure occupies a previously unoccupied region of the space, preserving the open-set flexibility that production environments require. The method extends deterministic failure fingerprinting with a learned representation layer for cases where exact matching is insufficient, organizing failures by operational similarity rather than surface-level error attributes. Contrastive failure embeddings offer a practical mechanism for improving incident retrieval in lakehouse environments and a foundation for intelligent operational assistants.

Downloads

Download data is not yet available.

References

T. Chen, S. Kornblith, M. Norouzi, and G. Hinton, "A simple framework for contrastive learning of visual representations," in Proc. 37th Int. Conf. Machine Learning (ICML), 2020, pp. 1597-1607.

P. Khosla, P. Teterwak, C. Wang et al., "Supervised contrastive learning," in Advances in Neural Information Processing Systems 33 (NeurIPS), 2020, pp. 18661-18673.

F. Graf, C. Hofer, M. Niethammer, and R. Kwitt, "Dissecting supervised contrastive learning," in Proc. 38th Int. Conf. Machine Learning (ICML), 2021, pp. 3821-3830.

D. Bahri, H. Jiang, Y. Tay, and D. Metzler, "SCARF: Self-supervised contrastive learning using random feature corruption," in Proc. 10th Int. Conf. Learning Representations (ICLR), 2022.

J. Yoon, Y. Zhang, J. Jordon, and M. van der Schaar, "VIME: Extending the success of self- and semi-supervised learning to tabular domain," in Advances in Neural Information Processing Systems 33 (NeurIPS), 2020.

T. Ucar, E. Hajiramezanali, and L. Edwards, "SubTab: Subsetting features of tabular data for self-supervised representation learning," in Advances in Neural Information Processing Systems 34 (NeurIPS), 2021.

R. Shwartz-Ziv and A. Armon, "Tabular data: Deep learning is not all you need," *Information Fusion*, vol. 81, pp. 84-90, 2022.

M. Du, F. Li, G. Zheng, and V. Srikumar, "DeepLog: Anomaly detection and diagnosis from system logs through deep learning," in Proc. 2017 ACM SIGSAC Conf. Comput. Commun. Security (CCS), 2017, pp. 1285-1298.

P. He, J. Zhu, Z. Zheng, and M. R. Lyu, "Drain: An online log parsing approach with fixed depth tree," in Proc. 2017 IEEE Int. Conf. Web Services (ICWS), 2017, pp. 33-40.

W. Meng et al., "LogRobust: Robust anomaly detection through template-aware log feature extraction," in Proc. ESEC/FSE, 2019.

X. Zhou, X. Peng, T. Xie, J. Sun, C. Ji, W. Li, and D. Ding, "Fault analysis and debugging of microservice systems: Industrial survey, benchmark system, and empirical study," *IEEE Trans. Software Engineering*, vol. 47, no. 2, pp. 243-260, 2019.

Y. Dang, Q. Lin, and P. Huang, "AIOps: Real-world challenges and research innovations," in *Proc. 41st IEEE/ACM Int. Conf. Software Engineering: Companion (ICSE-Companion)*, 2019, pp. 4-5.

M. Armbrust, A. Ghodsi, R. Xin, and M. Zaharia, "Lakehouse: A new generation of open platforms that unify data warehousing and advanced analytics," in Proc. CIDR, 2021.

M. Armbrust, T. Das, J. Torres et al., "Delta Lake: High-performance ACID table storage over cloud object stores," Proc. VLDB Endowment, vol. 13, no. 12, pp. 3411-3424, 2020.

J. Johnson, M. Douze, and H. Jegou, "Billion-scale similarity search with GPUs," IEEE Trans. Big Data, vol. 7, no. 3, pp. 535-547, 2019.

Y. A. Malkov and D. A. Yashunin, "Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs," IEEE Trans. Pattern Anal. Mach. Intell., vol. 42, no. 4, pp. 824-836, 2020.

F. Schroff, D. Kalenichenko, and J. Philbin, "FaceNet: A unified embedding for face recognition and clustering," in Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2015, pp. 815-823.

I. Loshchilov and F. Hutter, "SGDR: Stochastic gradient descent with warm restarts," in Proc. 5th Int. Conf. Learning Representations (ICLR), 2017.

L. van der Maaten and G. Hinton, "Visualizing data using t-SNE," J. Mach. Learn. Res., vol. 9, pp. 2579-2605, 2008.

L. McInnes, J. Healy, and J. Melville, "UMAP: Uniform manifold approximation and projection for dimension reduction," arXiv preprint arXiv:1802.03426, 2018.

Contrastive Failure Embeddings for Lakehouse Reliability Engineering

Authors

Keywords:

Abstract

Downloads

References

Downloads

Published

How to Cite

Issue

Section

License

Announcements

Information for Authors

ijisae

Information

Indexed By

Contrastive Failure Embeddings for Lakehouse Reliability Engineering

Authors

Keywords:

Abstract

Downloads

References

Downloads

Published

How to Cite

Issue

Section

License

Announcements

Information for Authors

Like, Subscribe and Share This Video

ijisae

Information

Indexed By