Hyperparameters in Deep Learning: A Comprehensive Review

Authors

  • Jatender Kumar, Naveen Dalal, Monika Sethi

Keywords:

Deep Learning, Hyperparameter, Optimization, Training, Validation

Abstract

Hyperparameters play a pivotal role in the training and performance of deep learning models. This review article explores the various types of hyperparameters, their impact on model performance, strategies for hyperparameter optimization, and recent advancements in this domain. Emphasis is placed on practical considerations and state-of-the-art techniques to aid researchers and practitioners in effectively tuning their models. Mastering hyperparameter optimization is crucial for maximizing the potential of deep learning models. By understanding the types of hyperparameters, their impact, and employing advanced optimization strategies, researchers and practitioners can enhance model performance effectively in various applications. This review consolidates practical insights and cutting-edge methodologies, offering a comprehensive guide for navigating the intricacies of hyperparameter tuning in deep learning.

Downloads

Download data is not yet available.

References

Schmidhuber, J. (2015). Deep learning in neural networks: An overview. Neural networks, 61, 85-117.

Hutter, F., Lücke, J., & Schmidt-Thieme, L. (2015). Beyond manual tuning of hyperparameters. KI-Künstliche Intelligenz, 29, 329-337.

Elsken, T., Metzen, J. H., & Hutter, F. (2019). Neural architecture search: A survey. Journal of Machine Learning Research, 20(55), 1-21.

Vincent, P., Larochelle, H., Lajoie, I., Bengio, Y., Manzagol, P. A., & Bottou, L. (2010). Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. Journal of machine learning research, 11(12).

LeCun, Y., Kavukcuoglu, K., & Farabet, C. (2010, May). Convolutional networks and applications in vision. In Proceedings of 2010 IEEE international symposium on circuits and systems (pp. 253-256). IEEE.

Nair, V., & Hinton, G. E. (2010). Rectified linear units improve restricted boltzmann machines. In Proceedings of the 27th international conference on machine learning (ICML-10) (pp. 807-814).

Jin, X., Xu, C., Feng, J., Wei, Y., Xiong, J., & Yan, S. (2016, February). Deep learning with s-shaped rectified linear activation units. In Proceedings of the AAAI conference on artificial intelligence (Vol. 30, No. 1).

Pennington, J., Schoenholz, S., & Ganguli, S. (2017). Resurrecting the sigmoid in deep learning through dynamical isometry: theory and practice. Advances in neural information processing systems, 30.

LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278-2324.

Liu, Y., & Parhi, K. K. (2016, November). Computing hyperbolic tangent and sigmoid functions using stochastic logic. In 2016 50th Asilomar Conference on Signals, Systems and Computers (pp. 1580-1585). IEEE.

Clevert, D. A., Unterthiner, T., & Hochreiter, S. (2015). Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289.

Ramachandran, P., Zoph, B., & Le, Q. V. (2017). Searching for activation functions. arXiv preprint arXiv:1710.05941.

Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning. MIT press.

Smith, L. N. (2017, March). Cyclical learning rates for training neural networks. In 2017 IEEE winter conference on applications of computer vision (WACV) (pp. 464-472). IEEE.

Radiuk, P. M. (2017). Impact of training set batch size on the performance of convolutional neural networks for diverse datasets.

Justus, D., Brennan, J., Bonner, S., & McGough, A. S. (2018, December). Predicting the computational cost of deep learning models. In 2018 IEEE international conference on big data (Big Data) (pp. 3873-3882). IEEE

Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2014). Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research, 15(1), 1929-1958.

Krogh, A., & Hertz, J. (1991). A simple weight decay can improve generalization. Advances in neural information processing systems, 4.

Taylor, L., & Nitschke, G. (2018, November). Improving deep learning with generic data augmentation. In 2018 IEEE symposium series on computational intelligence (SSCI) (pp. 1542-1547). IEEE.

Pal, K. K., & Sudeep, K. S. (2016, May). Preprocessing for image classification by convolutional neural networks. In 2016 IEEE International Conference on Recent Trends in Electronics, Information & Communication Technology (RTEICT) (pp. 1778-1781). IEEE.

Li, L., Jamieson, K., DeSalvo, G., Rostamizadeh, A., & Talwalkar, A. (2018). Hyperband: A novel bandit-based approach to hyperparameter optimization. Journal of Machine Learning Research, 18(185), 1-52.

Gomes, T. A., Prudêncio, R. B., Soares, C., Rossi, A. L., & Carvalho, A. (2012). Combining meta-learning and search techniques to select parameters for support vector machines. Neurocomputing, 75(1), 3-13.

Bergstra, J., & Bengio, Y. (2012). Random search for hyper-parameter optimization. Journal of machine learning research, 13(2).

Feurer, M., Springenberg, J., & Hutter, F. (2015, February). Initializing bayesian hyperparameter optimization via meta-learning. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 29, No. 1).

Bengio, Y. (2000). Gradient-based optimization of hyperparameters. Neural computation, 12(8), 1889-1900.

Guo, X. C., Yang, J. H., Wu, C. G., Wang, C. Y., & Liang, Y. C. (2008). A novel LS-SVMs hyper-parameter selection based on particle swarm optimization. Neurocomputing, 71(16-18), 3211-3215.

Downloads

Published

12.06.2024

How to Cite

Jatender Kumar. (2024). Hyperparameters in Deep Learning: A Comprehensive Review. International Journal of Intelligent Systems and Applications in Engineering, 12(4), 4015 –. Retrieved from https://ijisae.org/index.php/IJISAE/article/view/6967

Issue

Section

Research Article