Hyperparameters in Deep Learning: A Comprehensive Review
Keywords:
Deep Learning, Hyperparameter, Optimization, Training, ValidationAbstract
Hyperparameters play a pivotal role in the training and performance of deep learning models. This review article explores the various types of hyperparameters, their impact on model performance, strategies for hyperparameter optimization, and recent advancements in this domain. Emphasis is placed on practical considerations and state-of-the-art techniques to aid researchers and practitioners in effectively tuning their models. Mastering hyperparameter optimization is crucial for maximizing the potential of deep learning models. By understanding the types of hyperparameters, their impact, and employing advanced optimization strategies, researchers and practitioners can enhance model performance effectively in various applications. This review consolidates practical insights and cutting-edge methodologies, offering a comprehensive guide for navigating the intricacies of hyperparameter tuning in deep learning.
Downloads
References
Schmidhuber, J. (2015). Deep learning in neural networks: An overview. Neural networks, 61, 85-117.
Hutter, F., Lücke, J., & Schmidt-Thieme, L. (2015). Beyond manual tuning of hyperparameters. KI-Künstliche Intelligenz, 29, 329-337.
Elsken, T., Metzen, J. H., & Hutter, F. (2019). Neural architecture search: A survey. Journal of Machine Learning Research, 20(55), 1-21.
Vincent, P., Larochelle, H., Lajoie, I., Bengio, Y., Manzagol, P. A., & Bottou, L. (2010). Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. Journal of machine learning research, 11(12).
LeCun, Y., Kavukcuoglu, K., & Farabet, C. (2010, May). Convolutional networks and applications in vision. In Proceedings of 2010 IEEE international symposium on circuits and systems (pp. 253-256). IEEE.
Nair, V., & Hinton, G. E. (2010). Rectified linear units improve restricted boltzmann machines. In Proceedings of the 27th international conference on machine learning (ICML-10) (pp. 807-814).
Jin, X., Xu, C., Feng, J., Wei, Y., Xiong, J., & Yan, S. (2016, February). Deep learning with s-shaped rectified linear activation units. In Proceedings of the AAAI conference on artificial intelligence (Vol. 30, No. 1).
Pennington, J., Schoenholz, S., & Ganguli, S. (2017). Resurrecting the sigmoid in deep learning through dynamical isometry: theory and practice. Advances in neural information processing systems, 30.
LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278-2324.
Liu, Y., & Parhi, K. K. (2016, November). Computing hyperbolic tangent and sigmoid functions using stochastic logic. In 2016 50th Asilomar Conference on Signals, Systems and Computers (pp. 1580-1585). IEEE.
Clevert, D. A., Unterthiner, T., & Hochreiter, S. (2015). Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289.
Ramachandran, P., Zoph, B., & Le, Q. V. (2017). Searching for activation functions. arXiv preprint arXiv:1710.05941.
Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning. MIT press.
Smith, L. N. (2017, March). Cyclical learning rates for training neural networks. In 2017 IEEE winter conference on applications of computer vision (WACV) (pp. 464-472). IEEE.
Radiuk, P. M. (2017). Impact of training set batch size on the performance of convolutional neural networks for diverse datasets.
Justus, D., Brennan, J., Bonner, S., & McGough, A. S. (2018, December). Predicting the computational cost of deep learning models. In 2018 IEEE international conference on big data (Big Data) (pp. 3873-3882). IEEE
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2014). Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research, 15(1), 1929-1958.
Krogh, A., & Hertz, J. (1991). A simple weight decay can improve generalization. Advances in neural information processing systems, 4.
Taylor, L., & Nitschke, G. (2018, November). Improving deep learning with generic data augmentation. In 2018 IEEE symposium series on computational intelligence (SSCI) (pp. 1542-1547). IEEE.
Pal, K. K., & Sudeep, K. S. (2016, May). Preprocessing for image classification by convolutional neural networks. In 2016 IEEE International Conference on Recent Trends in Electronics, Information & Communication Technology (RTEICT) (pp. 1778-1781). IEEE.
Li, L., Jamieson, K., DeSalvo, G., Rostamizadeh, A., & Talwalkar, A. (2018). Hyperband: A novel bandit-based approach to hyperparameter optimization. Journal of Machine Learning Research, 18(185), 1-52.
Gomes, T. A., Prudêncio, R. B., Soares, C., Rossi, A. L., & Carvalho, A. (2012). Combining meta-learning and search techniques to select parameters for support vector machines. Neurocomputing, 75(1), 3-13.
Bergstra, J., & Bengio, Y. (2012). Random search for hyper-parameter optimization. Journal of machine learning research, 13(2).
Feurer, M., Springenberg, J., & Hutter, F. (2015, February). Initializing bayesian hyperparameter optimization via meta-learning. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 29, No. 1).
Bengio, Y. (2000). Gradient-based optimization of hyperparameters. Neural computation, 12(8), 1889-1900.
Guo, X. C., Yang, J. H., Wu, C. G., Wang, C. Y., & Liang, Y. C. (2008). A novel LS-SVMs hyper-parameter selection based on particle swarm optimization. Neurocomputing, 71(16-18), 3211-3215.
Downloads
Published
How to Cite
Issue
Section
License

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
All papers should be submitted electronically. All submitted manuscripts must be original work that is not under submission at another journal or under consideration for publication in another form, such as a monograph or chapter of a book. Authors of submitted papers are obligated not to submit their paper for publication elsewhere until an editorial decision is rendered on their submission. Further, authors of accepted papers are prohibited from publishing the results in other publications that appear before the paper is published in the Journal unless they receive approval for doing so from the Editor-In-Chief.
IJISAE open access articles are licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. This license lets the audience to give appropriate credit, provide a link to the license, and indicate if changes were made and if they remix, transform, or build upon the material, they must distribute contributions under the same license as the original.