Contrastive Failure Embeddings for Lakehouse Reliability Engineering
Keywords:
contrastive learning, failure embeddings, lakehouse reliability, incident retrievalAbstract
This study proposes a contrastive embedding approach for reliability engineering in lakehouse environments. The objective is to learn a vector representation of failure episodes that captures operational similarity, enabling recurrence detection, nearest-neighbor retrieval, and similarity-based analysis of historical incidents. Failure records are constructed from normalized telemetry features reflecting execution behavior, resource usage, runtime outcomes, and contextual metadata. A neural encoder is trained using a supervised contrastive loss with temperature-controlled scaling, pulling same-class failure pairs together while pushing apart pairs from different classes to produce geometrically well-separated clusters suited to cosine-similarity retrieval. Because the encoder maps failures into a continuous space rather than assigning discrete class labels, new failure types can be accommodated without retraining: a novel failure occupies a previously unoccupied region of the space, preserving the open-set flexibility that production environments require. The method extends deterministic failure fingerprinting with a learned representation layer for cases where exact matching is insufficient, organizing failures by operational similarity rather than surface-level error attributes. Contrastive failure embeddings offer a practical mechanism for improving incident retrieval in lakehouse environments and a foundation for intelligent operational assistants.
Downloads
References
T. Chen, S. Kornblith, M. Norouzi, and G. Hinton, "A simple framework for contrastive learning of visual representations," in Proc. 37th Int. Conf. Machine Learning (ICML), 2020, pp. 1597-1607.
P. Khosla, P. Teterwak, C. Wang et al., "Supervised contrastive learning," in Advances in Neural Information Processing Systems 33 (NeurIPS), 2020, pp. 18661-18673.
F. Graf, C. Hofer, M. Niethammer, and R. Kwitt, "Dissecting supervised contrastive learning," in Proc. 38th Int. Conf. Machine Learning (ICML), 2021, pp. 3821-3830.
D. Bahri, H. Jiang, Y. Tay, and D. Metzler, "SCARF: Self-supervised contrastive learning using random feature corruption," in Proc. 10th Int. Conf. Learning Representations (ICLR), 2022.
J. Yoon, Y. Zhang, J. Jordon, and M. van der Schaar, "VIME: Extending the success of self- and semi-supervised learning to tabular domain," in Advances in Neural Information Processing Systems 33 (NeurIPS), 2020.
T. Ucar, E. Hajiramezanali, and L. Edwards, "SubTab: Subsetting features of tabular data for self-supervised representation learning," in Advances in Neural Information Processing Systems 34 (NeurIPS), 2021.
R. Shwartz-Ziv and A. Armon, "Tabular data: Deep learning is not all you need," *Information Fusion*, vol. 81, pp. 84-90, 2022.
M. Du, F. Li, G. Zheng, and V. Srikumar, "DeepLog: Anomaly detection and diagnosis from system logs through deep learning," in Proc. 2017 ACM SIGSAC Conf. Comput. Commun. Security (CCS), 2017, pp. 1285-1298.
P. He, J. Zhu, Z. Zheng, and M. R. Lyu, "Drain: An online log parsing approach with fixed depth tree," in Proc. 2017 IEEE Int. Conf. Web Services (ICWS), 2017, pp. 33-40.
W. Meng et al., "LogRobust: Robust anomaly detection through template-aware log feature extraction," in Proc. ESEC/FSE, 2019.
X. Zhou, X. Peng, T. Xie, J. Sun, C. Ji, W. Li, and D. Ding, "Fault analysis and debugging of microservice systems: Industrial survey, benchmark system, and empirical study," *IEEE Trans. Software Engineering*, vol. 47, no. 2, pp. 243-260, 2019.
Y. Dang, Q. Lin, and P. Huang, "AIOps: Real-world challenges and research innovations," in *Proc. 41st IEEE/ACM Int. Conf. Software Engineering: Companion (ICSE-Companion)*, 2019, pp. 4-5.
M. Armbrust, A. Ghodsi, R. Xin, and M. Zaharia, "Lakehouse: A new generation of open platforms that unify data warehousing and advanced analytics," in Proc. CIDR, 2021.
M. Armbrust, T. Das, J. Torres et al., "Delta Lake: High-performance ACID table storage over cloud object stores," Proc. VLDB Endowment, vol. 13, no. 12, pp. 3411-3424, 2020.
J. Johnson, M. Douze, and H. Jegou, "Billion-scale similarity search with GPUs," IEEE Trans. Big Data, vol. 7, no. 3, pp. 535-547, 2019.
Y. A. Malkov and D. A. Yashunin, "Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs," IEEE Trans. Pattern Anal. Mach. Intell., vol. 42, no. 4, pp. 824-836, 2020.
F. Schroff, D. Kalenichenko, and J. Philbin, "FaceNet: A unified embedding for face recognition and clustering," in Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2015, pp. 815-823.
I. Loshchilov and F. Hutter, "SGDR: Stochastic gradient descent with warm restarts," in Proc. 5th Int. Conf. Learning Representations (ICLR), 2017.
L. van der Maaten and G. Hinton, "Visualizing data using t-SNE," J. Mach. Learn. Res., vol. 9, pp. 2579-2605, 2008.
L. McInnes, J. Healy, and J. Melville, "UMAP: Uniform manifold approximation and projection for dimension reduction," arXiv preprint arXiv:1802.03426, 2018.
Downloads
Published
How to Cite
Issue
Section
License

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
All papers should be submitted electronically. All submitted manuscripts must be original work that is not under submission at another journal or under consideration for publication in another form, such as a monograph or chapter of a book. Authors of submitted papers are obligated not to submit their paper for publication elsewhere until an editorial decision is rendered on their submission. Further, authors of accepted papers are prohibited from publishing the results in other publications that appear before the paper is published in the Journal unless they receive approval for doing so from the Editor-In-Chief.
IJISAE open access articles are licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. This license lets the audience to give appropriate credit, provide a link to the license, and indicate if changes were made and if they remix, transform, or build upon the material, they must distribute contributions under the same license as the original.


