Multimodal Cognitive Learning for Media Forgery Detection: A Comprehensive Framework Combining Random Forest and Deep Ensemble Architectures (Xception, ResNeXt) across Image, Video, and Audio Modalities

Authors

  • A. Abirami, S. Bhuvaneswari, Krithika K, Nithyasree I, Prashithaa Abhirami Balaji, Aadhithya R, Deexith P, Devesh R

Keywords:

Deepfake detection, Multi-modal system, Image manipulation, Video forgery, Audio spoofing, Convolutional Neural Networks (CNNs), Xception, ResNeXT, Spatiotemporal analysis, Mel spectrograms, F1 score, ROC curve, AUC.

Abstract

Deepfake content has become more prevalent in the age of quickly evolving technology, which has significantly undermined the reliability and integrity of digital media. An integrated multi-modal deepfake detection system is presented in this study as a response to the ubiquitous threat posed by altered photos, videos, and audio recordings. Our method, which makes use of advanced deep learning algorithms, provides a strong barrier against the spread of false information. The picture deepfake detection module examines visual data for telltale signs of manipulation using Convolutional Neural Networks (CNNs), Xception, and ResNeXT. This module successfully distinguishes between real and fake photos by carefully examining pixel-level attributes and contextual data. This capacity is expanded to include the world of movies by the video deepfake detection module. With the use of spatiotemporal CNNs (Xception & ResNeXT), it parses video frames to find minute discrepancies, making it possible to accurately identify deepfake films. Our multi-modal system is finished with the addition of deepfake audio detection. This module excels in differentiating between authentic and faked audio recordings using Mel spectrograms and Convolutional Neural Networks, adding to a thorough protection against audio deepfakes. Additionally, we provide a unifying framework that effectively unifies these three detection modules, boosting the system's effectiveness and performance as a whole. We thoroughly assess our solution utilizing metrics such as AUC, ROC curve, F1 score, and accuracy, and we depict our model structures for in-depth comprehension. Our multi-modal deepfake detection technology acts as a crucial precaution in a time when false information is widely disseminated, enabling consumers to distinguish fact from fiction across numerous media types. This study highlights the importance of our integrated solution in maintaining the legitimacy of digital content in today's information-driven world while also showcasing its technological capability.

Downloads

Download data is not yet available.

References

Sharma, A., & Gupta, R. (2019). Deep learning-based image forgery detection: A comprehensive review. IEEE Access, 7, 136785-136805.

Wu, Q., Wang, Y., & Zhang, W. (2020). Multimodal fusion and deep learning for media forensics. IEEE Transactions on Information Forensics and Security, 15, 2641-2656.

Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition.arXiv preprint arXiv:1409.1556.

He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition.In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) (pp. 770-778).

Li, X., & Li, X. (2019). Deep learning-based video forgery detection: A survey. IEEE Access, 7, 154740-154752.

Breiman, L. (2001). Random forests.Machine learning, 45(1), 5-32.

Lecun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278-2324.

Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). ImageNet classification with deep convolutional neural networks.In Advances in neural information processing systems (pp. 1097-1105).

Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., &Wojna, Z. (2016). Rethinking the inception architecture for computer vision.In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) (pp. 2818-2826).

Howard, A. G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., ...& Adam, H. (2017). Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861.

Tan, M., & Le, Q. V. (2019). Efficientnet: Rethinking model scaling for convolutional neural networks. In International conference on machine learning (pp. 6105-6114).

Deng, J., Dong, W., Socher, R., Li, L. J., Li, K., & Fei-Fei, L. (2009). ImageNet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition (pp. 248-255).

Carreira, J., & Zisserman, A. (2017). Quo vadis, action recognition?A new model and the Kinetics dataset.In proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) (pp. 6299-6308).

Simard, P. Y., Steinkraus, D., & Platt, J. C. (2003). Best practices for convolutional neural networks applied to visual document analysis. In Seventh International Conference on Document Analysis and Recognition (ICDAR 2003) (pp. 958-962).

Chollet, F. (2017). Xception: Deep learning with depthwise separable convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1251-1258).

Chen, L. C., Zhu, Y., Papandreou, G., Schroff, F., & Adam, H. (2018). Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the European conference on computer vision (ECCV) (pp. 801-818).

Kingma, D. P., & Ba, J. (2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.

[18] Deng, J., Dong, W., Socher, R., Li, L. J., Li, K., & Fei-Fei, L. (2009). ImageNet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition (pp. 248-255).

Liu, M. Y., Breuel, T., & Kautz, J. (2017). Unsupervised image-to-image translation networks.In Advances in neural information processing systems (pp. 700-708).

Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., ...& Bengio, Y. (2014). Generative adversarial nets.In Advances in neural information processing systems (pp. 2672-2680).

Ronneberger, O., Fischer, P., & Brox, T. (2015). U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computer-assisted intervention (MICCAI) (pp. 234-241).

Li, X., & Li, X. (2019). Deep learning-based video forgery detection: A survey. IEEE Access, 7, 154740-154752.

Breiman, L. (2001). Random forests.Machine learning, 45(1), 5-32.

Kingma, D. P., & Ba, J. (2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.

Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster R-CNN: Towards real-time object detection with region proposal networks. In Advances in neural information processing systems (pp. 91-99).

Downloads

Published

24.03.2024

How to Cite

S. Bhuvaneswari, Krithika K, Nithyasree I, Prashithaa Abhirami Balaji, Aadhithya R, Deexith P, Devesh R, A. A. . (2024). Multimodal Cognitive Learning for Media Forgery Detection: A Comprehensive Framework Combining Random Forest and Deep Ensemble Architectures (Xception, ResNeXt) across Image, Video, and Audio Modalities. International Journal of Intelligent Systems and Applications in Engineering, 12(3), 2618–2625. Retrieved from https://ijisae.org/index.php/IJISAE/article/view/5734

Issue

Section

Research Article