Reinforcement Learning Hadoop Map Reduce Parameters Optimization

Authors

  • Nandita Yambem, Rashmi S, A N Nandakumar

Keywords:

Hadoop, reinforcement learning, q-learning, mapreduce, HDFS

Abstract

Among the various techniques for enhancing Hadoop performance—such as intermediate data compression, in-memory management, and parameter tuning—dynamic configuration parameter tuning proves to be the most impactful. However, existing approaches face several challenges: limited adaptability to specific application requirements, isolated parameter tuning without considering interdependencies, and inaccurate linear assumptions in complex environments. To address these issues, this study introduces a reinforcement learning-based optimization framework using Q-Learning. The proposed method dynamically adjusts key Hadoop configuration parameters by continuously learning from job execution metrics such as completion time and wait times in map/reduce phases. It employs a reward-based feedback mechanism to minimize the gap between expected and actual performance, ensuring more accurate, adaptive, and holistic optimization. Additionally, the framework integrates a neural network to predict optimal parameter values, further enhancing decision-making. This approach significantly improves execution efficiency and resource utilization, offering robust adaptability across diverse workloads and operational environments, while aligning closely with service level agreements.

Downloads

Download data is not yet available.

References

Greeshma Lingam , “Reinforcement learning based energy efficient resource allocation strategy of MapReduce jobs with deadline constraint”, Cluster Computing ,2023, 26:2719–2735, Springer

Prashant Choppara and Sudheer Mangalampalli, “ An efficient deep reinforcement learning based task scheduler in cloud-fog environment”, Cluster Computing ,2025,28:67, Springer.

A. Verma, L. Cherkasova, and R. Campbell. Resource Provisioning Framework for MapReduce Jobs with Performance Goals. ACM/IFIP/USENIX Middleware, pages 165–186, 2011.

Y. Chen, A. Ganapathi, and R. H. Katz, “To compress or not to compress-compute vs. io tradeoffs for mapreduce energy efficiency,” in Proceedings of the first ACM SIGCOMM workshop on Green networking. ACM, 2010, pp. 23–28.

M. Zaharia, A. Konwinski, A. D. Joseph, R. H. Katz, and I. Stoica. Improving mapreduce performance in heterogeneous environments. In USENIX Symposium on Operating Systems Design and Implementation (OSDI), volume 8, page 7, 2008

Y Chen, S Alspaugh, R Katz, “Interactive analytical processing in big data systems: A cross-industry study of mapreduce workloads”,2012, arXiv preprint arXiv:1208.4174

Qi Zhang, “PRISM: Fine-Grained Resource-Aware Scheduling for MapReduce”, 2015 IEEE

Zhenhua Guo , Geoffrey Fox , Mo Zhou, Investigation of Data Locality in MapReduce, Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012), p.419-426, May 13-16, 2012 [doi>10.1109/CCGrid.2012.42]

Adam Crume , Joe Buck , Carlos Maltzahn , Scott Brandt, Compressing Intermediate Keys between Mappers and Reducers in SciHadoop, Proceedings of the 2012 SC Companion: High Performance Computing, Networking Storage and Analysis, p.7-12, November 10-16, 2012 [doi>10.1109/SC.Companion.2012.12]

Nandita Yambem, AN Nandakumar, “AMPO: Algorithm for MapReduce Performance Optimization for enhancing big data analytics”, IEEE,2017

W. Yu, Y. Wang, X. Que, and C. Xu, “Virtual shuffling for efficient data movement in mapreduce,” IEEE Transactions on Computers, vol. 64, no. 2, pp. 556–568, 2015

D. Moise, T.-T.-L. Trieu, L. Boug´e, and G. Antoniu, “Optimizing intermediate data management in mapreduce computations,” in Proceedings of the first international workshop on cloud computing platforms. ACM, 2011, pp. 1–7 .

B. Nicolae, D. Moise, G. Antoniu, and al. BlobSeer: Bringing high throughput under heavy concurrency to Hadoop Map/Reduce applications. In Procs of the 24th IPDPS 2010, 2010. In press

Chen, Xiang & Liang, Yi & Li, Guang-Rui & Chen, Cheng & Liu,Si-Yu. (2017). Optimizing Performance of Hadoop with Parameter Tuning. ITM Web of Conferences. 12. 03040. 10.1051/itmconf/20171203040.

G. Ruan, H. Zhang, and B. Plale, “Exploiting mapreduce and data compression for data-intensive applications,” in Proceedings of the Conference on Extreme Science and Engineering Discovery Environment: Gateway to Discovery. ACM, 2013, pp. 1–8

Bhaskar, Archana & Ranjan, Rajeev. (2019). Optimized memory model for hadoop map reduce framework. International Journal of Electrical and Computer Engineering (IJECE). 9. 4396. 10.11591/ijece.v9i5.pp4396-4407.

Nandita Yambem A. N. Nandakumar, “Enhanced Performance of Hadoop Parameters Using Hybrid Meta Heuristics Optimization Techniques”, International Journal of Intelligent Systems and Applications in Engineering,2024, Volume 12,Issue No.3 , 1508-1513

Veiga, Jorge & Expósito, Roberto & Taboada, Guillermo & Touriño, Juan. (2018). Enhancing in-memory efficiency for MapReduce-based data processing. Journal of Parallel and Distributed Computing. 120. 10.1016/j.jpdc.2018.04.001.

Maria Malik, Hassan Ghasemzadeh, Tinoosh Mohsenin, Rosario Cammarota, Liang Zhao, Avesta Sasan, Houman Homayoun, and Setareh Rafatirad. 2019. ECoST: Energy-Efficient Co-Locating and Self-Tuning MapReduce Applications. In Proceedings of the 48th International Conference on Parallel Processing (ICPP 2019).

C, K. and X, A. (2020), Task failure resilience technique for improving the performance of MapReduce in Hadoop. ETRI Journal, 42: 748-760. https://doi.org/10.4218/etrij.2018-0265

Liao G., Datta K., Willke T.L. (2013) Gunther: Search-Based Auto-Tuning of MapReduce. In: Wolf F., Mohr B., an Mey D. (eds) Euro-Par 2013 Parallel Processing. Euro-Par 2013. Lecture Notes in Computer Science, vol 8097. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40047-6_42

S. Kumar, S. Padakandla, L. Chandrashekar, P. Parihar, K. Gopinath and S. Bhatnagar, "Scalable Performance Tuning of Hadoop MapReduce: A Noisy Gradient Approach," 2017 IEEE 10th International Conference on Cloud Computing (CLOUD), Honolulu, CA, 2017, pp. 375-382, doi: 10.1109/CLOUD.2017.55

https://engineering.purdue.edu/~puma/datasets.htm

J Veiga, RR Expósito, GL Taboada, J Touriño, “Enhancing in-memory efficiency for MapReduce-based data processing”, Journal of Parallel and Distributed Computing , April 2018, 323-338

S. Kumar, S Padakandla, L Chandrashekar, P Parihar, K Gopinath , “Scalable performance tuning of hadoop mapreduce: a noisy gradient approach”,IEEE,2017

B Nicolae, D Moise, G Antoniu, L Bougé, M Dorier, “BlobSeer: Bringing high throughput under heavy concurrency to Hadoop Map-Reduce applications”,IEEE,2010

Downloads

Published

19.04.2025

How to Cite

Nandita Yambem. (2025). Reinforcement Learning Hadoop Map Reduce Parameters Optimization . International Journal of Intelligent Systems and Applications in Engineering, 13(1), 63 –. Retrieved from https://ijisae.org/index.php/IJISAE/article/view/7448

Issue

Section

Research Article