Reinforcement Learning Hadoop Map Reduce Parameters Optimization
Keywords:
Hadoop, reinforcement learning, q-learning, mapreduce, HDFSAbstract
Among the various techniques for enhancing Hadoop performance—such as intermediate data compression, in-memory management, and parameter tuning—dynamic configuration parameter tuning proves to be the most impactful. However, existing approaches face several challenges: limited adaptability to specific application requirements, isolated parameter tuning without considering interdependencies, and inaccurate linear assumptions in complex environments. To address these issues, this study introduces a reinforcement learning-based optimization framework using Q-Learning. The proposed method dynamically adjusts key Hadoop configuration parameters by continuously learning from job execution metrics such as completion time and wait times in map/reduce phases. It employs a reward-based feedback mechanism to minimize the gap between expected and actual performance, ensuring more accurate, adaptive, and holistic optimization. Additionally, the framework integrates a neural network to predict optimal parameter values, further enhancing decision-making. This approach significantly improves execution efficiency and resource utilization, offering robust adaptability across diverse workloads and operational environments, while aligning closely with service level agreements.
Downloads
References
Greeshma Lingam , “Reinforcement learning based energy efficient resource allocation strategy of MapReduce jobs with deadline constraint”, Cluster Computing ,2023, 26:2719–2735, Springer
Prashant Choppara and Sudheer Mangalampalli, “ An efficient deep reinforcement learning based task scheduler in cloud-fog environment”, Cluster Computing ,2025,28:67, Springer.
A. Verma, L. Cherkasova, and R. Campbell. Resource Provisioning Framework for MapReduce Jobs with Performance Goals. ACM/IFIP/USENIX Middleware, pages 165–186, 2011.
Y. Chen, A. Ganapathi, and R. H. Katz, “To compress or not to compress-compute vs. io tradeoffs for mapreduce energy efficiency,” in Proceedings of the first ACM SIGCOMM workshop on Green networking. ACM, 2010, pp. 23–28.
M. Zaharia, A. Konwinski, A. D. Joseph, R. H. Katz, and I. Stoica. Improving mapreduce performance in heterogeneous environments. In USENIX Symposium on Operating Systems Design and Implementation (OSDI), volume 8, page 7, 2008
Y Chen, S Alspaugh, R Katz, “Interactive analytical processing in big data systems: A cross-industry study of mapreduce workloads”,2012, arXiv preprint arXiv:1208.4174
Qi Zhang, “PRISM: Fine-Grained Resource-Aware Scheduling for MapReduce”, 2015 IEEE
Zhenhua Guo , Geoffrey Fox , Mo Zhou, Investigation of Data Locality in MapReduce, Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012), p.419-426, May 13-16, 2012 [doi>10.1109/CCGrid.2012.42]
Adam Crume , Joe Buck , Carlos Maltzahn , Scott Brandt, Compressing Intermediate Keys between Mappers and Reducers in SciHadoop, Proceedings of the 2012 SC Companion: High Performance Computing, Networking Storage and Analysis, p.7-12, November 10-16, 2012 [doi>10.1109/SC.Companion.2012.12]
Nandita Yambem, AN Nandakumar, “AMPO: Algorithm for MapReduce Performance Optimization for enhancing big data analytics”, IEEE,2017
W. Yu, Y. Wang, X. Que, and C. Xu, “Virtual shuffling for efficient data movement in mapreduce,” IEEE Transactions on Computers, vol. 64, no. 2, pp. 556–568, 2015
D. Moise, T.-T.-L. Trieu, L. Boug´e, and G. Antoniu, “Optimizing intermediate data management in mapreduce computations,” in Proceedings of the first international workshop on cloud computing platforms. ACM, 2011, pp. 1–7 .
B. Nicolae, D. Moise, G. Antoniu, and al. BlobSeer: Bringing high throughput under heavy concurrency to Hadoop Map/Reduce applications. In Procs of the 24th IPDPS 2010, 2010. In press
Chen, Xiang & Liang, Yi & Li, Guang-Rui & Chen, Cheng & Liu,Si-Yu. (2017). Optimizing Performance of Hadoop with Parameter Tuning. ITM Web of Conferences. 12. 03040. 10.1051/itmconf/20171203040.
G. Ruan, H. Zhang, and B. Plale, “Exploiting mapreduce and data compression for data-intensive applications,” in Proceedings of the Conference on Extreme Science and Engineering Discovery Environment: Gateway to Discovery. ACM, 2013, pp. 1–8
Bhaskar, Archana & Ranjan, Rajeev. (2019). Optimized memory model for hadoop map reduce framework. International Journal of Electrical and Computer Engineering (IJECE). 9. 4396. 10.11591/ijece.v9i5.pp4396-4407.
Nandita Yambem A. N. Nandakumar, “Enhanced Performance of Hadoop Parameters Using Hybrid Meta Heuristics Optimization Techniques”, International Journal of Intelligent Systems and Applications in Engineering,2024, Volume 12,Issue No.3 , 1508-1513
Veiga, Jorge & Expósito, Roberto & Taboada, Guillermo & Touriño, Juan. (2018). Enhancing in-memory efficiency for MapReduce-based data processing. Journal of Parallel and Distributed Computing. 120. 10.1016/j.jpdc.2018.04.001.
Maria Malik, Hassan Ghasemzadeh, Tinoosh Mohsenin, Rosario Cammarota, Liang Zhao, Avesta Sasan, Houman Homayoun, and Setareh Rafatirad. 2019. ECoST: Energy-Efficient Co-Locating and Self-Tuning MapReduce Applications. In Proceedings of the 48th International Conference on Parallel Processing (ICPP 2019).
C, K. and X, A. (2020), Task failure resilience technique for improving the performance of MapReduce in Hadoop. ETRI Journal, 42: 748-760. https://doi.org/10.4218/etrij.2018-0265
Liao G., Datta K., Willke T.L. (2013) Gunther: Search-Based Auto-Tuning of MapReduce. In: Wolf F., Mohr B., an Mey D. (eds) Euro-Par 2013 Parallel Processing. Euro-Par 2013. Lecture Notes in Computer Science, vol 8097. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40047-6_42
S. Kumar, S. Padakandla, L. Chandrashekar, P. Parihar, K. Gopinath and S. Bhatnagar, "Scalable Performance Tuning of Hadoop MapReduce: A Noisy Gradient Approach," 2017 IEEE 10th International Conference on Cloud Computing (CLOUD), Honolulu, CA, 2017, pp. 375-382, doi: 10.1109/CLOUD.2017.55
https://engineering.purdue.edu/~puma/datasets.htm
J Veiga, RR Expósito, GL Taboada, J Touriño, “Enhancing in-memory efficiency for MapReduce-based data processing”, Journal of Parallel and Distributed Computing , April 2018, 323-338
S. Kumar, S Padakandla, L Chandrashekar, P Parihar, K Gopinath , “Scalable performance tuning of hadoop mapreduce: a noisy gradient approach”,IEEE,2017
B Nicolae, D Moise, G Antoniu, L Bougé, M Dorier, “BlobSeer: Bringing high throughput under heavy concurrency to Hadoop Map-Reduce applications”,IEEE,2010
Downloads
Published
How to Cite
Issue
Section
License

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
All papers should be submitted electronically. All submitted manuscripts must be original work that is not under submission at another journal or under consideration for publication in another form, such as a monograph or chapter of a book. Authors of submitted papers are obligated not to submit their paper for publication elsewhere until an editorial decision is rendered on their submission. Further, authors of accepted papers are prohibited from publishing the results in other publications that appear before the paper is published in the Journal unless they receive approval for doing so from the Editor-In-Chief.
IJISAE open access articles are licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. This license lets the audience to give appropriate credit, provide a link to the license, and indicate if changes were made and if they remix, transform, or build upon the material, they must distribute contributions under the same license as the original.