A DenseU-Net framework for Music Source Separation using Spectrogram Domain Approach

Authors

  • Vinitha George E., V. P. Devassia

Keywords:

Autoencoder, Convolutional Neural Network, Deep learning, DenseNet, Music source separation, ResNet, U-Net architecture

Abstract

Audio source separation has been intensively explored by the research community. Deep learning algorithms aid in creating a neural network model to isolate the different sources present in a music mixture. In this paper, we propose an algorithm to separate the constituent sources present in a music signal mixture using a DenseUNet framework. The conversion of an audio signal into a spectrogram, akin to an image, accentuates the valuable attributes concealed in the time domain signal. Hence, a spectrogram-based model is chosen for the extraction of the target signal. The model incorporates a dense block into the layers of the U-Net structure. The proposed system is trained to extract individual source spectrograms from the mixture spectrogram. An ablation study was performed by replacing the dense block with convolution filters to study the effectiveness of the dense block. The proposed method proves to be more efficient in comparison with other state-of-the-art methods. The experiment results to separate vocals, bass, drums and others show an average SDR of 6.59 dB on the MUSDB database.

Downloads

Download data is not yet available.

References

G. Ozmen, I. A. Ozkan, I. Seref, S. Tasdemir, C¸ Mustafa and E. Arslan, “Sound analysis to recognize cattle vocalization in a semi-open barn,” Gazi Muhendislik Bilimleri Dergisi ” , vol. 8, no. 1, pp. 158–167, 2022.

Binjaku, K., Janku, J. and Meçe, E.K., “Identifying Low-Resource Languages in Speech Recordings through Deep Learning,” In 2022 International Conference on Software, Telecommunications and Computer Networks (SoftCOM), pp. 1-6, IEEE, September, 2022.

Yuan, W., Wang, S., Li, X., Unoki, M. and Wang, W., 2019. A skip attention mechanism for monaural singing voice separation. IEEE Signal Processing Letters, 26(10), pp.1481-1485.

Klapuri, T. Virtanen, and T. Heittola, “Sound source separation in monaural music signals using excitation-filter model and EM algorithm,” in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP ’10), pp. 5510–5513, March 2010.

Zhao, Y., Wang, D., Johnson, E.M. and Healy, E.W., “A deep learning based segregation algorithm to increase speech intelligibility for hearing-impaired listeners in reverberant-noisy conditions,” The Journal of the Acoustical Society of America, 144(3), pp.1627-1637, 2018.

Huang, Po-Sen, et al. "Singing-voice separation from monaural recordings using robust principal component analysis." 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2012.

Hyvärinen A, Oja E, “Independent component analysis: algorithms and applications,” Neural networks, vol. 13(4-5), pp. 411- 430, June 2000.

T. Virtanen, “Monaural sound source separation by non-negative matrix factorization with temporal continuity and sparseness criteria,” IEEE Transactions on Audio, Speech and Language Processing, vol. 15, no. 3, pp. 1066–1074, 2007.

Doğan, S.M. and Salor, Ö.,“Music/singing voice separation based on repeating pattern extraction technique and robust principal component analysis,” In 2018 5th International Conference on Electrical and Electronic Engineering (ICEEE) (pp. 482-487), IEEE, May 2018.

P. S. Huang, M. Kim, M. Hasegawa-Johnson, and P. Smaragdis, “Joint optimization of masks and deep recurrent neural networks for monaural source separation,” IEEE/ACM Trans. ASLP, vol. 23, no. 12, pp. 2136-2147, 2015.

Kadandale, Venkatesh S., Juan F. Montesinos, Gloria Haro, and Emilia Gomez, “Multi-channel U-Net for Music Source Separation,” in IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP), pp. 1-6, IEEE, 2020.

D. Stoller, S. Ewert, and S. Dixon, “Wave-u-net: A multiscale neural network for end-to-end audio source separation,” in Proceedings of the 19th International Society for Music Information Retrieval Conference, ISMIR 2018, pp. 334 – 340, 2018.

D. Samuel, A. Ganeshan, and J. Naradowsky, “Meta-learning Extractors for Music Source Separation,” in Proceedings of the 45th International Conference on Acoustics, Speech and Signal Processing (ICASSP 2020), May 2020.

Défossez A, Usunier N, Bottou L, Bach F, “Music source separation in the waveform domain,” arXiv preprint arXiv:1911.13254, Nov 27, 2019.

Chandna P, Miron M, Janer J, Gomez E, “Monoaural audio source separation using deep convolutional neural networks,” in International Conference on Latent Variable Analysis and Signal Separation, pp. 258–266, Springer 2017.

S. Uhlich, M. Porcu, F. Giron, M. Enenkl, T. Kemp, N. Takahashi, and Y. Mitsufuji, “Improving music source separation based on deep networks through data augmentation and network blending,” in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 261–265, March 2017.

N. Takahashi, N. Goswami, and Y. Mitsufuji, “MMDenseLSTM: An efficient combination of convolutional and recurrent neural networks for audio source separation,” in 16th International Workshop on Acoustic Signal Enhancement (IWAENC), pp. 106–110, IEEE, 2018.

J.-Y. Liu and Y.-H. Yang, “Dilated convolution with dilated GRU for music source separation,” in International Joint Conferences on Artificial Intelligence Organization, (IJCAI), 2019.

E. M. Grais and M. D. Plumbley, “Single channel audio source separation using convolutional denoising autoencoders,” in IEEE global conference on signal and information processing (GlobalSIP), pp.1265 -1269, Nov. 2017.

Jansson A., Humphrey E.J., Montecchio N., Bittner R., Kumar A., and Weyde T., “Singing voice separation with deep U-Net convolutional networks,” in Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), pp. 323 – 332, 2017.

Giorgio Fabbro, Stefan Uhlich, Chieh-Hsin Lai, Woosung Choi, Marco Martinez-Ramirez, Weihsiang Liao, Igor Gadelha, Geraldo Ramos, Eddie Hsu, Hugo Rodrigues, et al., “The sound demixing challenge 2023—Music demixing track,” arXiv preprint arXiv:2308.06979, 2023.

W.-H. Heo, H. Kim, and O.-W. Kwon, “Source separation using dilated time-frequency DenseNet for music identification in broadcast contents,” Applied Sciences, vol. 10, no. 5, pp. 1727, Mar. 2020.

Kong, Q., Cao, Y., Liu, H., Choi, K. and Wang, Y., “Decoupling magnitude and phase estimation with deep resunet for music source separation,” in 22nd International Society for Music Information Retrieval Conference, 2021.

Fabian-Robert Stöter, Stefan Uhlich, Antoine Liutkus, and Yuki Mitsufuji, “Open-unmix – a reference implementation for music source separation,” Journal of Open Source Software, 4(41), pp.1667, 2019.

N. Takahashi and Y. Mitsufuji, “D3Net: Densely connected multidilated DenseNet for music source separation,” arXiv preprint arXiv:2010.01733, 2020.

G. Huang, Z. Liu, K. Q. Weinberger, “Densely connected convolutional networks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4700–4708, 2016.

Luo, Y. and Yu, J., 2022. Music source separation with band-split rnn. arXiv preprint arXiv:2209.15174.

Li Tingle, Jiawei Chen, Haowen Hou, and Ming Li., “Sams-net: A sliced attention-based neural network for music source separation,” in 2021 12th International Symposium on Chinese Spoken Language Processing (ISCSLP), pp. 1-5, IEEE, 2021.

O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in International Conference on Medical Image Computing and Computer-Assisted Intervention,” pp. 234–241, Springer, 2015.

Yue Cao, Shigang Liu, Yali Peng, and Jun Li., “Denseunet: densely connected unet for electron microscopy image segmentation,” IET Image Processing, 14(12):2682–2689, 2020.

Jayashree, P., Rajesh, P., Amol, D., Nihar, R., Mubin, T, “Gradient bald vulture optimization enabled multi-objective Unet++ with DCNN for prostate cancer segmentation and detection. Biomed. Signal Process. Control,” 87, 105474 2024. https://doi.org/10.1016/j.bspc.2023.105474.

Ince T, Kiranyaz S, Devecioglu O C, Khan M S, Chowdhury M, Gabbouj M, “Blind Restoration of Real-World Audio by 1D Operational GANs,” arXiv preprint arXiv:2212.14618. 2022.

E. V. George and V. P. Devassia, “A novel U-Net with dense block for drum signal separation from polyphonic music signal mixture,” Signal, Image Video Process., vol. 17, no. 3, pp. 627–633, Apr. 2023.

Vinitha George E and V P Devassia, “A DenseU-Net for separation of vocals from polyphonic music signal mixture”, Grenze International Journal of Engineering and Technology, vol.9, no. 1, pp. 2648-2655, Jan 2023

Zafar Rafii, Antoine Liutkus, Fabian-Robert Stoter, Stylianos Ioannis Mimilakis, and Rachel Bittner, “The MUSDB18 corpus for music separation,” Dec. 2017.

S. Ioffe and C. Szegedy., “Batch normalization: Accelerating deep network training by reducing internal covariate shift,” in International conference on machine learning, PNLR, pp. 448–456, 2015.

Diederik P. Kingma and Jimmy Ba, “Adam: A method for stochastic optimization,” in 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015.

K. He, X. Zhang, S. Ren, and J. Sun, “Delving deep into rectifiers: Surpassing human-level performance on imagenet classification,” in Proceedings of the IEEE international conference on computer vision ICCV, pp. 1026-1034, 2015.

Stoter, Fabian-Robert, Antoine Liutkus, and Nobutaka Ito, “The 2018 signal separation evaluation campaign,” in International Conference on Latent Variable Analysis and Signal Separation, pp. 293 -305, Springer, 2018.

E. Vincent, R. Gribonval, and C. Fevotte, “Performance measurement in blind audio source separation,” IEEE Transactions on Audio, Speech and Language Processing, vol. 14, no. 4, pp. 1462–1469, July 2006.

Downloads

Published

12.06.2024

How to Cite

Vinitha George E. (2024). A DenseU-Net framework for Music Source Separation using Spectrogram Domain Approach. International Journal of Intelligent Systems and Applications in Engineering, 12(4), 77–85. Retrieved from https://ijisae.org/index.php/IJISAE/article/view/6175

Issue

Section

Research Article