Bird Sound Classification using Deep Neural Networks: A Comparative Analysis of State-of-the-Art Models and Custom Architectures
Keywords:
Bird sound classification, Deep Neural Networks, Mel Frequency Cepstral Coefficients, Spectrogram, CNN, Xception, InceptionV3, ResNet50, EfficientNet, VGG16, BirdCLEF 2022, Xeno-CantoAbstract
Bird sound classification plays a vital role in ecological monitoring and biodiversity conservation efforts. In this research paper, we explore the efficacy of Deep Neural Networks (DNNs) for this task, conducting a comparative analysis of five well-established methods: Xception, InceptionV3, ResNet50, EfficientNet, and VGG16. The BirdCLEF 2022 dataset, sourced from Xeno-Canto on Kaggle, serves as the foundation for our investigation. To extract essential acoustic features from the dataset, we employ Mel Frequency Cepstral Coefficients (MFCC). By converting the audio files into spectrograms, we enable the utilization of image-based classification techniques on this audio data. In addition to the state-of-the-art models, we design and implement two custom-made Convolutional Neural Network (CNN) architectures. These models surpass several existing approaches, achieving accuracy rates of 80.11% and 76.94%, respectively. Our research offers valuable insights into the performance and suitability of various DNN models for bird sound classification. Furthermore, the success of our custom architectures highlights the potential for tailored solutions in this domain. The outcomes of this study have implications for bird species identification, ecological monitoring, and wildlife conservation efforts, paving the way for further advancements in avian soundscape analysis.
Downloads
References
K. W. Gunawan, A. A. Hidayat, T. W. Cenggoro, and B. Pardamean, “Repurposing transfer learning strategy of computer vision for owl sound classification,” Procedia Computer Science, vol. 216, pp. 424–430, 2023.
M. Ramashini, P. E. Abas, K. Mohanchandra, and L. C. De Silva, “Robust cepstral feature for bird sound classification,” Int. J. Electr. Comput. Eng.(2088-8708), vol. 12, pp. 1477–1487, 2022.
W.-P. Vellinga and R. Planque´, “The xeno-canto collection and its relation to sound recognition and classification.” in CLEF (Working Notes), 2015.
H. Goe¨au, H. Glotin, W.-P. Vellinga, R. Planque´, and A. Joly, “Lifeclef bird identification task 2016: The arrival of deep learning,” in CLEF: Conference and Labs of the Evaluation Forum, no. 1609, 2016, pp. 440–449.
Y. Chang and R. O. Sinnott, “Machine learning-based classification of birds through birdsong,” arXiv preprint arXiv:2212.04684, 2022.
J. Wimmer, M. Towsey, P. Roe, and I. Williamson, “Sampling environmental acoustic recordings to determine bird species richness,” Ecological Applications, vol. 23, no. 6, pp. 1419–1428, 2013.
C.-Y. Koh, J.-Y. Chang, C.-L. Tai, D.-Y. Huang, H.-H. Hsieh, and Y.-W. Liu, “Bird sound classification using convolutional neural networks.” in CLEF (Working Notes), 2019.
H. A. Jasim, S. R. Ahmed, A. A. Ibrahim, and A. D. Duru, “Classify bird species audio by augment convolutional neural network,” in 2022 International Congress on Human-Computer Interaction, Optimization and Robotic Applications (HORA). IEEE, 2022, pp. 1–6.
J. Xie, K. Hu, M. Zhu, J. Yu, and Q. Zhu, “Investigation of different cnn-based models for improved bird sound classification,” IEEE Access, vol. 7, pp. 175 353–175 361, 2019.
S. Kahl, T. Wilhelm-Stein, H. Hussein, H. Klinck, D. Kowerko, M. Ritter, and M. Eibl, “Large-scale bird sound classification using convolutional neural networks.” CLEF (working notes), vol. 1866, 2017.
K. Qian, Z. Zhang, A. Baird, and B. Schuller, “Active learning for bird sound classification via a kernel-based extreme learning machine,” The Journal of the Acoustical Society of America, vol. 142, no. 4, pp. 1796–1804, 2017.
M. Ramashini, P. E. Abas, U. Grafe, and L. C. De Silva, “Bird sounds classification using linear discriminant analysis,” in 2019 4th International Conference and Workshops on Recent Advances and Innovations in Engineering (ICRAIE). IEEE, 2019, pp. 1–6.
X. Ji, K. Jiang, and J. Xie, “Lbp-based bird sound classification using improved feature selection algorithm,” International Journal of Speech Technology, vol. 24, pp. 1033–1045, 2021.
L. Mu¨ller and M. Marti, “Bird sound classification using a bidirectional lstm.” in CLEF (Working Notes), 2018.
Y. Qiao, K. Qian, and Z. Zhao, “Learning higher representations from bioacoustics: A sequence-to-sequence deep learning approach for bird sound classification,” in Neural Information Processing: 27th International Conference, ICONIP 2020, Bangkok, Thailand, November 18–22, 2020, Proceedings, Part V. Springer, 2020, pp. 130–138.
S. D. H. Permana, G. Saputra, B. Arifitama, W. Caesarendra, R. Rahim et al., “Classification of bird sounds as an early warning method of forest fires using convolutional neural network (cnn) algorithm,” Journal of King Saud University-Computer and Information Sciences, vol. 34, no. 7, pp. 4345–4357, 2022.
F. Yang, Y. Jiang, and Y. Xu, “Design of bird sound recognition model based on lightweight,” IEEE Access, vol. 10, pp. 85 189–85 198, 2022.
F. Chollet, “Xception: Deep learning with depthwise separable convolutions,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 1251–1258.
C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, “Rethinking the inception architecture for computer vision,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 2818–2826.
K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778.
M. Tan and Q. Le, “Efficientnet: Rethinking model scaling for convolutional neural networks,” in International conference on machine learning. PMLR, 2019, pp. 6105–6114.
K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014.
Kamatchi, S. B. ., Agme, V. N. ., Premkumar, S., Prasad, K. ., V, D. G. ., & Gugan, I. . (2023). Enhancing Microcomputer Edge Computing for Autonomous IoT Motion Control. International Journal on Recent and Innovation Trends in Computing and Communication, 11(3), 58–67. https://doi.org/10.17762/ijritcc.v11i3.6202
Rossi, G., Nowak, K., Nielsen, M., García, A., & Silva, J. Machine Learning-Based Risk Analysis in Engineering Project Management. Kuwait Journal of Machine Learning, 1(2). Retrieved from http://kuwaitjournals.com/index.php/kjml/article/view/114
Downloads
Published
How to Cite
Issue
Section
License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
All papers should be submitted electronically. All submitted manuscripts must be original work that is not under submission at another journal or under consideration for publication in another form, such as a monograph or chapter of a book. Authors of submitted papers are obligated not to submit their paper for publication elsewhere until an editorial decision is rendered on their submission. Further, authors of accepted papers are prohibited from publishing the results in other publications that appear before the paper is published in the Journal unless they receive approval for doing so from the Editor-In-Chief.
IJISAE open access articles are licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. This license lets the audience to give appropriate credit, provide a link to the license, and indicate if changes were made and if they remix, transform, or build upon the material, they must distribute contributions under the same license as the original.