Bird Sound Classification using Deep Neural Networks: A Comparative Analysis of State-of-the-Art Models and Custom Architectures

Shubham  Revadekar; Vidhish  Panchal; Pratik  Kanani; Kamal  Shah; Anil  Vasoya; Ravikumar  Pandey

Authors

Shubham Revadekar Dwarkadas J. Sanghvi College of Engineering, Mumbai, India.
Vidhish Panchal Dwarkadas J. Sanghvi College of Engineering, Mumbai, India.
Pratik Kanani Dwarkadas J. Sanghvi College of Engineering, Mumbai, India.
Kamal Shah Thakur College of Engineering and Technology, Mumbai, India.
Anil Vasoya Thakur College of Engineering and Technology, Mumbai, India.
Ravikumar Pandey Dwarkadas J. Sanghvi College of Engineering, Mumbai, India.

Keywords:

Bird sound classification, Deep Neural Networks, Mel Frequency Cepstral Coefficients, Spectrogram, CNN, Xception, InceptionV3, ResNet50, EfficientNet, VGG16, BirdCLEF 2022, Xeno-Canto

Abstract

Bird sound classification plays a vital role in ecological monitoring and biodiversity conservation efforts. In this research paper, we explore the efficacy of Deep Neural Networks (DNNs) for this task, conducting a comparative analysis of five well-established methods: Xception, InceptionV3, ResNet50, EfficientNet, and VGG16. The BirdCLEF 2022 dataset, sourced from Xeno-Canto on Kaggle, serves as the foundation for our investigation. To extract essential acoustic features from the dataset, we employ Mel Frequency Cepstral Coefficients (MFCC). By converting the audio files into spectrograms, we enable the utilization of image-based classification techniques on this audio data. In addition to the state-of-the-art models, we design and implement two custom-made Convolutional Neural Network (CNN) architectures. These models surpass several existing approaches, achieving accuracy rates of 80.11% and 76.94%, respectively. Our research offers valuable insights into the performance and suitability of various DNN models for bird sound classification. Furthermore, the success of our custom architectures highlights the potential for tailored solutions in this domain. The outcomes of this study have implications for bird species identification, ecological monitoring, and wildlife conservation efforts, paving the way for further advancements in avian soundscape analysis.

Downloads

Download data is not yet available.

References

K. W. Gunawan, A. A. Hidayat, T. W. Cenggoro, and B. Pardamean, “Repurposing transfer learning strategy of computer vision for owl sound classification,” Procedia Computer Science, vol. 216, pp. 424–430, 2023.

M. Ramashini, P. E. Abas, K. Mohanchandra, and L. C. De Silva, “Robust cepstral feature for bird sound classification,” Int. J. Electr. Comput. Eng.(2088-8708), vol. 12, pp. 1477–1487, 2022.

W.-P. Vellinga and R. Planque´, “The xeno-canto collection and its relation to sound recognition and classification.” in CLEF (Working Notes), 2015.

H. Goe¨au, H. Glotin, W.-P. Vellinga, R. Planque´, and A. Joly, “Lifeclef bird identification task 2016: The arrival of deep learning,” in CLEF: Conference and Labs of the Evaluation Forum, no. 1609, 2016, pp. 440–449.

Y. Chang and R. O. Sinnott, “Machine learning-based classification of birds through birdsong,” arXiv preprint arXiv:2212.04684, 2022.

J. Wimmer, M. Towsey, P. Roe, and I. Williamson, “Sampling environmental acoustic recordings to determine bird species richness,” Ecological Applications, vol. 23, no. 6, pp. 1419–1428, 2013.

C.-Y. Koh, J.-Y. Chang, C.-L. Tai, D.-Y. Huang, H.-H. Hsieh, and Y.-W. Liu, “Bird sound classification using convolutional neural networks.” in CLEF (Working Notes), 2019.

H. A. Jasim, S. R. Ahmed, A. A. Ibrahim, and A. D. Duru, “Classify bird species audio by augment convolutional neural network,” in 2022 International Congress on Human-Computer Interaction, Optimization and Robotic Applications (HORA). IEEE, 2022, pp. 1–6.

J. Xie, K. Hu, M. Zhu, J. Yu, and Q. Zhu, “Investigation of different cnn-based models for improved bird sound classification,” IEEE Access, vol. 7, pp. 175 353–175 361, 2019.

S. Kahl, T. Wilhelm-Stein, H. Hussein, H. Klinck, D. Kowerko, M. Ritter, and M. Eibl, “Large-scale bird sound classification using convolutional neural networks.” CLEF (working notes), vol. 1866, 2017.

K. Qian, Z. Zhang, A. Baird, and B. Schuller, “Active learning for bird sound classification via a kernel-based extreme learning machine,” The Journal of the Acoustical Society of America, vol. 142, no. 4, pp. 1796–1804, 2017.

M. Ramashini, P. E. Abas, U. Grafe, and L. C. De Silva, “Bird sounds classification using linear discriminant analysis,” in 2019 4th International Conference and Workshops on Recent Advances and Innovations in Engineering (ICRAIE). IEEE, 2019, pp. 1–6.

X. Ji, K. Jiang, and J. Xie, “Lbp-based bird sound classification using improved feature selection algorithm,” International Journal of Speech Technology, vol. 24, pp. 1033–1045, 2021.

L. Mu¨ller and M. Marti, “Bird sound classification using a bidirectional lstm.” in CLEF (Working Notes), 2018.

Y. Qiao, K. Qian, and Z. Zhao, “Learning higher representations from bioacoustics: A sequence-to-sequence deep learning approach for bird sound classification,” in Neural Information Processing: 27th International Conference, ICONIP 2020, Bangkok, Thailand, November 18–22, 2020, Proceedings, Part V. Springer, 2020, pp. 130–138.

S. D. H. Permana, G. Saputra, B. Arifitama, W. Caesarendra, R. Rahim et al., “Classification of bird sounds as an early warning method of forest fires using convolutional neural network (cnn) algorithm,” Journal of King Saud University-Computer and Information Sciences, vol. 34, no. 7, pp. 4345–4357, 2022.

F. Yang, Y. Jiang, and Y. Xu, “Design of bird sound recognition model based on lightweight,” IEEE Access, vol. 10, pp. 85 189–85 198, 2022.

F. Chollet, “Xception: Deep learning with depthwise separable convolutions,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 1251–1258.

C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, “Rethinking the inception architecture for computer vision,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 2818–2826.

K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778.

M. Tan and Q. Le, “Efficientnet: Rethinking model scaling for convolutional neural networks,” in International conference on machine learning. PMLR, 2019, pp. 6105–6114.

K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014.

Kamatchi, S. B. ., Agme, V. N. ., Premkumar, S., Prasad, K. ., V, D. G. ., & Gugan, I. . (2023). Enhancing Microcomputer Edge Computing for Autonomous IoT Motion Control. International Journal on Recent and Innovation Trends in Computing and Communication, 11(3), 58–67. https://doi.org/10.17762/ijritcc.v11i3.6202

Rossi, G., Nowak, K., Nielsen, M., García, A., & Silva, J. Machine Learning-Based Risk Analysis in Engineering Project Management. Kuwait Journal of Machine Learning, 1(2). Retrieved from http://kuwaitjournals.com/index.php/kjml/article/view/114

Bird Sound Classification using Deep Neural Networks: A Comparative Analysis of State-of-the-Art Models and Custom Architectures

Authors

Keywords:

Abstract

Downloads

References

Downloads

Published

How to Cite

Issue

Section

License

Most read articles by the same author(s)

Announcements

Information for Authors

ijisae

Information

trindex