Automated Data Pipeline Optimization for Real-Time Machine Learning Inference
Keywords:
Machine Learning, Automation, Automated Data pipeline, Real time interferenceAbstract
This has catalyzed the enhanced desire of real-time ML, which therefore requires effective data pipeline that involves data pre-processing, feature selection, and model assessment. This is a system that integrates Models for automated data pipeline; this optimizes the ML process, reduces the chances of human error, and enhance predictive models’ accuracy. Developed with Python, the Scikit-learn library and Streamlit, the system allows for data uploading, data preprocessing, feature selection choice and models’ assessment. Also, presented results confirm higher effectiveness and availability to a larger number of users of the resulting products. Though there are some limitations like compatibility issues with the datasets, computation time and memory etc, the future augmentations based on deep learning, real-time data streaming along with the use of cloud environment for deployment will improve the prospects of automation in ML.
Downloads
References
. Bian, J., Al Arafat, A., Xiong, H., Li, J., Li, L., Chen, H., Wang, J., Dou, D. and Guo, Z., 2022. Machine learning in real-time Internet of Things (IoT) systems: A survey. IEEE Internet of Things Journal, 9(11), pp.8364-8386.
. Kum, S., Oh, S., Yeom, J. and Moon, J., 2022. Optimization of edge resources for deep learning application with batch and model management. Sensors, 22(17), p.6717.
. Kuchnik, M., Klimovic, A., Simsa, J., Smith, V. and Amvrosiadis, G., 2022. Plumber: Diagnosing and removing performance bottlenecks in machine learning data pipelines. Proceedings of Machine Learning and Systems, 4, pp.33-51.
. Nasir, W. and Jack, H., 2025. Real-Time Machine Learning Pipelines: Optimizing Stream Processing for Scalable AI Applications. ResearchGate AI & Data Science Journal.
. Xiang, Y. and Kim, H., 2019, December. Pipelined data-parallel CPU/GPU scheduling for multi-DNN real-time inference. In 2019 IEEE Real-Time Systems Symposium (RTSS) (pp. 392-405). IEEE.
. Abbas, T. and Eldred, A., 2025. AI-Powered Stream Processing: Bridging Real-Time Data Pipelines with Advanced Machine Learning Techniques. ResearchGate Journal of AI & Cloud Analytics.
. Derakhshan, B., Mahdiraji, A.R., Rabl, T. and Markl, V., 2019, March. Continuous Deployment of Machine Learning Pipelines. In EDBT (pp. 397-408).
. Rachakatla, S.K., Ravichandran, P. and Kumar, N., 2022. Scalable Machine Learning Workflows in Data Warehousing: Automating Model Training and Deployment with AI. Australian Journal of AI and Data Science.
. Crankshaw, D., Sela, G.E., Mo, X., Zumar, C., Stoica, I., Gonzalez, J. and Tumanov, A., 2020, October. InferLine: latency-aware provisioning and scaling for prediction serving pipelines. In Proceedings of the 11th ACM Symposium on Cloud Computing (pp. 477-491).
. González, G. and Evans, C.L., 2019. Biomedical Image Processing with Containers and Deep Learning: An Automated Analysis Pipeline: Data architecture, artificial intelligence, automated processing, containerization, and clusters orchestration ease the transition from data acquisition to insights in medium‐to‐large datasets. BioEssays, 41(6), p.1900004.
. Alves, J.M., Honório, L.M. and Capretz, M.A., 2019. ML4IoT: A framework to orchestrate machine learning workflows on internet of things data. IEEE Access, 7, pp.152953-152967.
. Swamy, T., Zulfiqar, A., Nardi, L., Shahbaz, M. and Olukotun, K., 2023, March. Homunculus: Auto-generating efficient data-plane ml pipelines for datacenter networks. In Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 3 (pp. 329-342).
. Boppiniti, S.T., 2021. Real-time data analytics with ai: Leveraging stream processing for dynamic decision support. International Journal of Management Education for Sustainable Development, 4(4).
. Hassan, N.A.B., 2025. Managing Data Dependencies in Cloud-Based Big Data Pipelines: Challenges, Solutions, and Performance Optimization Strategies. Orient Journal of Emerging Paradigms in Artificial Intelligence and Autonomous Systems, 15(2), pp.20-28.
. Elshawi, R., Maher, M. and Sakr, S., 2019. Automated machine learning: State-of-the-art and open challenges. arXiv preprint arXiv:1906.02287.
. Niu, W., Li, Z., Ma, X., Dong, P., Zhou, G., Qian, X., Lin, X., Wang, Y. and Ren, B., 2021. Grim: A general, real-time deep learning inference framework for mobile devices based on fine-grained structured weight sparsity. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(10), pp.6224-6239.
. Prosper, J., 2019. Deploying Scalable Deep Learning Models for Real-Time Customer Insight.
. Liu, S., Yao, S., Fu, X., Tabish, R., Yu, S., Bansal, A., Yun, H., Sha, L. and Abdelzaher, T., 2020, December. On removing algorithmic priority inversion from mission-critical machine inference pipelines. In 2020 IEEE Real-Time Systems Symposium (RTSS) (pp. 319-332). IEEE.
. Shuvo, M.M.H., Islam, S.K., Cheng, J. and Morshed, B.I., 2022. Efficient acceleration of deep learning inference on resource-constrained edge devices: A review. Proceedings of the IEEE, 111(1), pp.42-91.
. Shen, Y., Cao, D., Ruddy, K. and Teixeira de Moraes, L.F., 2020. Near real-time hydraulic fracturing event recognition using deep learning methods. SPE Drilling & Completion, 35(03), pp.478-489.
. Smistad, E., Østvik, A., Salte, I.M., Melichova, D., Nguyen, T.M., Haugaa, K., Brunvand, H., Edvardsen, T., Leclerc, S., Bernard, O. and Grenne, B., 2020. Real-time automatic ejection fraction and foreshortening detection using deep learning. IEEE transactions on ultrasonics, ferroelectrics, and frequency control, 67(12), pp.2595-2604.
. Zhao, Z., Wang, K., Ling, N. and Xing, G., 2021, May. Edgeml: An automl framework for real-time deep learning on the edge. In Proceedings of the international conference on internet-of-things design and implementation (pp. 133-144).3.
. Li, Y., Mahjoubfar, A., Chen, C.L., Niazi, K.R., Pei, L. and Jalali, B., 2019. Deep cytometry: deep learning with real-time inference in cell sorting and flow cytometry. Scientific reports, 9(1), p.11088.
. Jeong, E., Kim, J. and Ha, S., 2022. Tensorrt-based framework and optimization methodology for deep learning inference on jetson boards. ACM Transactions on Embedded Computing Systems (TECS), 21(5), pp.1-26.
. Manzoor, S., Kim, E.J., Joo, S.H., Bae, S.H., In, G.G., Joo, K.J., Choi, J.H. and Kuc, T.Y., 2022. Edge deployment framework of guardbot for optimized face mask recognition with real-time inference using deep learning. Ieee Access, 10, pp.77898-77921.
. Ma, D., Fang, H., Wang, N., Zheng, H., Dong, J. and Hu, H., 2022. Automatic defogging, deblurring, and real-time segmentation system for sewer pipeline defects. Automation in Construction, 144, p.104595.
. Zuromski, L.M., Durtschi, J., Aziz, A., Chumley, J., Dewey, M., English, P., Morrison, M., Simmon, K., Whipple, B., O'Fallon, B. and Ng, D.P., 2024. Clinical validation of a real‐time machine learning‐based system for the detection of acute myeloid leukemia by flow cytometry. Cytometry Part B: Clinical Cytometry.
. Ammar, A., Koubaa, A., Boulila, W., Benjdira, B. and Alhabashi, Y., 2023. A multi-stage deep-learning-based vehicle and license plate recognition system with real-time edge inference. Sensors, 23(4), p.2120.
. Seenivasan, D., 2024. AI Driven Enhancement of ETL Workflows for Scalable and Efficient Cloud Data Engineering. International Journal of Engineering and Computer Science, 13(06), pp.10-18535.
. Verma, G., Gupta, Y., Malik, A.M. and Chapman, B., 2021, June. Performance evaluation of deep learning compilers for edge inference. In 2021 IEEE international parallel and distributed processing symposium workshops (IPDPSW) (pp. 858-865). IEEE.
Downloads
Published
How to Cite
Issue
Section
License

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
All papers should be submitted electronically. All submitted manuscripts must be original work that is not under submission at another journal or under consideration for publication in another form, such as a monograph or chapter of a book. Authors of submitted papers are obligated not to submit their paper for publication elsewhere until an editorial decision is rendered on their submission. Further, authors of accepted papers are prohibited from publishing the results in other publications that appear before the paper is published in the Journal unless they receive approval for doing so from the Editor-In-Chief.
IJISAE open access articles are licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. This license lets the audience to give appropriate credit, provide a link to the license, and indicate if changes were made and if they remix, transform, or build upon the material, they must distribute contributions under the same license as the original.