Secure and Cost-Efficient Deployment of Data-Intensive AI Workloads in Cloud Platforms
Keywords:
Cloud Deployment, Data-Intensive Workloads, Artificial Intelligence Pipelines, Governance-Aware Optimization, Operational ObservabilityAbstract
Cloud infrastructure remains the primary deployment platform for data-hungry AI pipelines, with elastic compute and managed storage allowing rapid provisioning at scale. Yet engineering production-grade deployments remains a poorly solved problem․ The objectives on performance‚ cost‚ security‚ and operational readiness are tightly coupled with each other‚ but the existing deployment frameworks optimize them separately and only satisfy one objective at a time․ End-to-end AI pipelines entail diverse data ingestion/transformation‚ feature generation‚ model training‚ batch inference‚ online serving‚ and continuous monitoring‚ which exhibit heterogeneous resource utilization and scaling․ A monolithic deployment strategy cannot simultaneously meet the needs of various components․ We propose a framework that proactively realizes data locality, elastic resource allocation, governance-aware isolation, and observability readiness at design time as opposed to applying these concepts post-deployment. It generates candidate deployment plans in placement and scaling dimensions from stage-level workload characteristics‚ filters them using policy and observability feasibility gates‚ and emits run-time readiness artifacts for auditability and reliable run-time operations․ The problem is framed as a constrained multi-objective optimization․ The parameters of interest are the tail latency, total cloud cost, and a surrogate for operational risk, which incorporates the exposure surface and the blast radius. The trade-offs between data localization, elasticity, and governance are investigated, and it is shown that joint planning can reveal deployment options overlooked by several performance- and cost-first baselines
Downloads
References
Syed Nyamtulla and Dr. Dhirendra Kumar Tripathi, "Serverless vs Traditional Cloud Architectures: Performance and Cost Evaluation of AI/ML Workloads in HPC Environments," International Research Journal of Engineering & Applied Sciences, 2025. [Online. . Available: https://www.irjeas.org/wp-content/uploads/admin/volume13/V13I4/IRJEAS04V13I4017.pdf
Weizheng Xu et al., "Parallelizing DNN training on GPUs: Challenges and opportunities." Companion Proceedings of the Web Conference 2021. [Online. . Available: https://dl.acm.org/doi/pdf/10.1145/3442442.3452055
Qizhen Weng et al., "{MLaaS} in the wild: Workload analysis and scheduling in {Large-Scale} heterogeneous {GPU} clusters," 19th USENIX Symposium on Networked Systems Design and Implementation (NSDI 22). 2022. [Online. . Available: https://www.usenix.org/system/files/nsdi22-paper-weng.pdf
Juncheng Gu et al., "Tiresias: A {GPU} cluster manager for distributed deep learning," 16th USENIX Symposium on Networked Systems Design and Implementation (NSDI 19). 2019. [Online. . Available: https://www.usenix.org/system/files/nsdi19-gu.pdf
Sohaib Ahmad et al., "Proteus: A high-throughput inference-serving system with accuracy scaling." Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 1. 2024. [Online. . Available: https://dl.acm.org/doi/pdf/10.1145/3617232.3624849
Arpan Gujarati et al., "Serving {DNNs} like clockwork: Performance predictability from the bottom up," 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20). 2020. [Online. . Available: https://www.usenix.org/system/files/osdi20-gujarati.pdf
Maria Papaioannou et al., “A survey on security threats and countermeasures in the internet of medical things (IoMT)," Transactions on Emerging Telecommunications Technologies 33.6 (2022): e4049. [Online. . Available: https://onlinelibrary.wiley.com/doi/abs/10.1002/ett.4049
IBRAHEEM ADEBAYO ADEREMI et al., "Explainable AI for Water Quality Monitoring: A Systematic Review of Transparency, Interpretability, and Trust." IEEE Sensors Reviews (2025). [Online. . Available: https://ieeexplore.ieee.org/document/11112533
Yinfang Chen et al., "Automatic root cause analysis via large language models for cloud incidents," Proceedings of the Nineteenth European Conference on Computer Systems. 2024. [Online. . Available: https://dl.acm.org/doi/pdf/10.1145/3627703.3629553
Falope Samson, "Multi-Modal AI for Serverless Cloud Security." (2026). [Online. . Available: https://www.researchgate.net/profile/Falope-Samson/publication/403569954
Deepak Narayanan et al., "Efficient large-scale language model training on GPU clusters using Megatron-LM," Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis. 2021. [Online. . Available: https://dl.acm.org/doi/pdf/10.1145/3458817.3476209
Sandra Wachter et al., "Why a right to explanation of automated decision-making does not exist in the general data protection regulation." International data privacy law 7.2 (2017): 76-99. [Online. . Available: https://academic.oup.com/idpl/article-abstract/7/2/76/3860948?redirectedFrom=PDF
Hongzi Mao et al., "Park: An open platform for learning-augmented computer systems." Advances in Neural Information Processing Systems 32 (2019). [Online. . Available: https://proceedings.neurips.cc/paper/2019/file/f69e505b08403ad2298b9f262659929a-Paper.pdf
Jasmin Bogatinovski et al., "Artificial Intelligence for IT Operations (AIOps)," Workshop White Paper, arXiv preprint arXiv:2101.06054 (2021). [Online. . Available: https://arxiv.org/pdf/2101.06054
JOHN OUSTERHOUT et al., "The RAMCloud storage system." ACM Transactions on Computer Systems (TOCS) 33.3 (2015): 1-55. [Online. . Available: https://dl.acm.org/doi/pdf/10.1145/2806887
Downloads
Published
How to Cite
Issue
Section
License

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
All papers should be submitted electronically. All submitted manuscripts must be original work that is not under submission at another journal or under consideration for publication in another form, such as a monograph or chapter of a book. Authors of submitted papers are obligated not to submit their paper for publication elsewhere until an editorial decision is rendered on their submission. Further, authors of accepted papers are prohibited from publishing the results in other publications that appear before the paper is published in the Journal unless they receive approval for doing so from the Editor-In-Chief.
IJISAE open access articles are licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. This license lets the audience to give appropriate credit, provide a link to the license, and indicate if changes were made and if they remix, transform, or build upon the material, they must distribute contributions under the same license as the original.


