Analysing the landscape of Deep Fake Detection: A Survey

Authors

  • Kishan Vyas Department of Artificial Intelligence and Machine Learning, Symbiosis Institute of Technology, Pune, Maharashtra, India.
  • Preksha Pareek Department of Artificial Intelligence and Machine Learning, Symbiosis Institute of Technology, Pune, Maharashtra, India.
  • Ruchi Jayaswal Department of Artificial Intelligence and Machine Learning, Symbiosis Institute of Technology, Pune, Maharashtra, India.
  • Shruti Patil Department of Artificial Intelligence and Machine Learning, Symbiosis Institute of Technology, Pune, Maharashtra, India.

Keywords:

DeepFake, Artificial Intelligence, Machine learning, Deep learning, CNN, RNN, Multimodality, Forensics, Cyber security

Abstract

With the rapid advancement of deep learning and generative modelling techniques, the creation of hyper-realistic synthetic media, known as deepfakes, has become a growing concern. These manipulated media assets are in various domains, including politics, entertainment, and security. As a result, the development of effective deepfake detection systems has gained significant attention. To identify Deepfakes, many researchers have developed various binary-classification-based detection techniques. This survey provides a comprehensive overview of state-of-the-art deepfake detection methods, their underlying principles, datasets used for training and evaluation, and the challenges faced in this evolving field. Then, alternative methods have been discussed in literature to address an issue raised by Deepfake. We analyse different techniques by groping them into four categories: Image-based DFDT, Audio-based DFDT, Video-based DFDT, and Multimodality based DFDT. Researchers in this area will learn from our study because it contains cutting-edge methods for detecting deep-fake videos and photos in social media. To benchmark and facilitate the advancement of deepfake detection systems, numerous datasets are examined, such as FaceForensics++, DeepFake Detection Challenge (DFDC), and Celeb-DF. Also discuss their characteristics, variations, and limitations, emphasizing the need for diverse and realistic datasets to ensure the model generalization.

Downloads

Download data is not yet available.

References

Shraddha Suratkar, Sayali Bhiungade, Jui Pitale, Komal Soni, Tushar Badgujar & Faruk Kazi (2022) Deep-fake video detection approaches using convolutional – recurrent neural networks, Journal of Control and Decision.

A. Malik, M. Kuribayashi, S. M. Abdullahi and A. N. Khan, "Deepfake Detection for Human Face Images and Videos: A Survey," in IEEE Access, vol. 10, pp. 18757-18775, 2022, doi: 10.1109/ACCESS.2022.3151186.

Hasam Khalid, Minha Kim, Shahroz Tariq, Simon S. Woo, “Evaluation of an Audio-Video Multimodal Deepfake Dataset using Unimodal and Multimodal Detectors”, 2021

Y. Zhou and S. Lim, "Joint Audio-Visual Deepfake Detection," in 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 2021 pp. 14780-14789.

M. S. Rana, M. N. Nobi, B. Murali and A. H. Sung, "Deepfake Detection: A Systematic Literature Review," in IEEE Access, vol. 10, pp. 25494-25513, 2022, doi: 10.1109/ACCESS.2022.3154404.

"Deepfakes Detection Techniques Using Deep Learning: A Survey" written by Abdulqader M. Almars, published by Journal of Computer and Communications, Vol.9 No.5, 2021

D. Güera and E. J. Delp, "Deepfake Video Detection Using Recurrent Neural Networks," 2018 15th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), 2018, pp. 1-6, doi: 10.1109/AVSS.2018.8639163.

H. Zhao, T. Wei, W. Zhou, W. Zhang, D. Chen and N. Yu, "Multi-attentional Deepfake Detection," 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021, pp. 2185-2194, doi: 10.1109/CVPR46437.2021.00222.

Ahmed S Abdulreda; Ahmed J Obaid. "A landscape view of Deepfake techniques and detection methods". International Journal of Nonlinear Analysis and Applications, 13, 1, 2022, 745-755. doi: 10.22075/ijnaa.2022.5580

Khan, Madeeha B.; Goel, Sanjay; Katar Anandan, Jaswant; Zhao, Jersey; and Naik, Ramavath Rakesh, "Deepfake Audio Detection" (2022). AMCIS 2022 Proceedings. 23.

Almutairi, Z.; Elgibreen, H. A Review of Modern Audio Deepfake Detection Methods: Challenges and Future Directions. Algorithms 2022, 15, 155. https:// doi.org/10.3390/a15050155

Sheldon Fung, Xuequan Lu, Chao Zhang, Chang-Tsun Li. DeepfakeUCL: Deepfake Detection via Unsupervised Contrastive Learning. https://doi.org/10.48550/arXiv.2104.11507.

Badhrinarayan Malolan, Ankit Parekh, Faruk Kazi, “Explainable Deep-Fake Detection Using Visual Interpretability Methods”,2020 3rd International conference on Information and Computer Technologies (ICICT).

D. P Kingma and M. Welling, “Auto-Encoding Variational Bayes,” ArXiv e-prints, Dec. 2013. arXiv: 1312.6114 [stat.ML].

Sánchez Martín, Pablo. (2018). Unsupervised Deep Learning: Research and Implementation of Variational Autoencoders.

Brian Dolhansky, Russ Howes, Ben Pflaum, Nicole Baram, Cristian Canton Ferrer AI Red Team, Facebook AI; The Deepfake Detection Challenge (DFDC) Preview Dataset; 23 Oct 2019.

L. Jiang, R. Li, W. Wu, C. Qian and C. C. Loy, "DeeperForensics-1.0: A Large-Scale Dataset for Real-World Face Forgery Detection," 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 2886-2895, doi: 10.1109/CVPR42600.2020.00296.

Nirkin Y, Wolf L, Keller Y, Hassner T. Deepfake Detection Based on Discrepancies Between Faces and Their Context. IEEE Trans Pattern Anal Mach Intell. 2022 Oct;44(10):6111-6121. doi: 10.1109/TPAMI.2021.3093446. Epub 2022 Sep 14. PMID: 34185639.

Olshausen, B., Field, D. Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature 381, 607–609 (1996).

Jiewu Leng, Pingyu Jiang. A deep learning approach for relationship extraction from interaction context in social manufacturing paradigm, Knowledge-Based Systems, 2016, ISSN 0950-7051.

Hoq, Uddin & Park (2021) Hoq M, Uddin MN, Park S-B. Vocal feature extraction-based artificial intelligent model for Parkinson’s disease detection. Diagnostics. 2021;11(6):1076. doi: 10.3390/diagnostics11061076.

Kandasamy V, Hubálovský Š, Trojovský P. Deep fake detection using a sparse auto encoder with a graph capsule dual graph CNN. PeerJ Comput Sci. 2022 May 31;8:e953. doi: 10.7717/peerj-cs.953.

T.-N. Le, H. H. Nguyen, J. Yamagishi, and I. Echizen, ‘‘Open Forensics: Large-scale challenging dataset for multi-face forgery detection and segmentation in-the-wild,’’ in Proc. Int. Conf. Computer. Vis., Oct. 2021, pp. 10117–10127.

B. Zi, M. Chang, J. Chen, X. Ma, and Y.-G. Jiang, ‘‘Wild Deepfake: A challenging real-world dataset for Deepfake detection,’’ in Proc. 28th ACM Int. Conf. Multimedia, Oct. 2020, pp. 2382–2390.

Yuezun Li, Xin Yang, Pu Sun, Honggang Qi and Siwei Lyu; “Celeb-DF: A Large-scale Challenging Dataset for Deepfake Forensics”; arXiv:1909.12962v4 [cs.CR] 16 Mar 2020.

Liu, Ziwei and Luo, Ping and Wang, Xiaogang, and tang, Xiaoou. “Deep learning face attributes in the wild”, in Dec. 2015.

Goodfellow, I., Bengio, Y., Courville, A. and Bengio, Y. (2016) Deep Learning (No. 2). MIT Press, Cambridge.

Hochreiter, S. and Schmidhuber, J. (1997) Long Short-Term Memory. Neural Computation, 9, 1735-1780. https://doi.org/10.1162/neco.1997.9.8.1735.

Schuster, M. and Paliwal, K.K. (1997) Bidirectional Recurrent Neural Networks. IEEE Transactions on Signal Processing, 45, 2673-2681. https://doi.org/10.1109/78.650093.

P.L, Chithra and P, Bhavani, A Study on Various Image Processing Techniques (May 7, 2019). International Journal of Emerging Technology and Innovative Engineering Volume 5, Issue 5, May 2019, Available at SSRN: https://ssrn.com/abstract=3388008.

H. Farid, ‘‘Image forgery detection,’’ IEEE Signal Process. Mag., vol. 26, no. 2, pp. 16–25, Mar. 2009.

I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, ‘‘Generative adversarial nets,’’ in Proc. Adv. Neural Inf. Process. Syst., vol. 27, 2014, pp. 1–9.

P. Baldi, ‘‘Autoencoders, unsupervised learning, and deep architectures,’’ in Proc. ICML Workshop Unsupervised Transf. Learn., 2012, pp. 37–49.

Bañuelos, J. (2022). Evolution of Deepfake: semantic fields and discursive genres (2017-2021). Revista ICONO 14. Revista Científica de Comunicación y Tecnologías Emergentes, 20(1). https://doi.org/10.7195/ri14.v20i1.1 773.

Scharre, Paul, et al. “The Artificial Intelligence Revolution.” ARTIFICIAL INTELLIGENCE, 2018, pp. 3–4,19 Oct. 2022

Goodfellow, I., Bengio, Y., Courville, A. and Bengio, Y. (2016) Deep Learning (No. 2). MIT Press, Cambridge.

Bengio, Y., Simard, P. and Frasconi, P. (1994) Learning Long-Term Dependencies with Gradient Descent Is Difficult. IEEE Transactions on Neural Networks, 5, 157-166. https://doi.org/10.1109/72.279181

Li, Y., Chang, M. and Lyu S. (2018): Exposing AI generated fake face videos by detecting Eye blinking. 2018 IEEE workshop and Security, Hong kong, 11-13 December 2018, 1-7.

Tariq, S., Lee, S., Kim, H., Shin, Y. and Woo, S.S. (2018) Detecting Both Machine and Human Created Fake Face Images in the Wild. Proceedings of the 2nd International Workshop on Multimedia Privacy and Security, Toronto, 15 October 2018, 81-87.

Li, H., Li, B., Tan, S. and Huang, J. (2018) Detection of Deep Network Generated Images Using Disparities in Colour Components.

Lu Y, Chai J, Cao X (2021) Live speech portraits: real-time photorealistic talking-head animation. ACM Trans Graph 40:1–17

Lomnitz, Michael & Hampel-Arias, Zigfried & Sandesara, Vishal & Hu, Simon. (2020). Multimodal Approach for Deepfake Detection. 1-9. 10.1109/AIPR50011.2020.9425192.

Prajwal K, Mukhopadhyay R, Namboodiri VP, Jawahar C (2020) A lip sync expert is all you need for speech to lip generation in the wild. In: Proceedings of the 28th ACM international conference on multimedia, pp 484–492.

Kim H, Garrido P, Tewari A, Xu W, Thies J, Niessner M, Pérez P, Richardt C, Zollhöfer M, Theobalt C (2018) Deep video portraits. ACM Trans Graph 37:163–177.

Li, Yuezun, and Siwei Lyu. "Exposing Deepfake videos by detecting face warping artifacts." arXiv preprint arXiv:1811.00656 (2018).

Wang, Junke, et al. "M2tr: Multi-modal multi-scale transformers for Deepfake detection." Proceedings of the 2022 International Conference on Multimedia Retrieval. 2022.

Hatmaker Taylor, “DARPA is funding new tech that can identify manipulated videos and Deepfake”, web blog post. May 01, 2018.

Y. Mirsky and W. Lee, ‘‘The creation and detection of deepfakes: A survey,’’ ACM Comput. Surv., vol. 54, no. 1, pp. 1–41, Jan. 2022.

M. Masood, M. Nawaz, K. M. Malik, A. Javed, and A. Irtaza, ‘‘Deepfakes generation and detection: State-of-the-art, open challenges, countermeasures, and way forward,’’ 2021, arXiv:2103.00484.

B. Fan, L. Wang, F. K. Soong, and L. Xie, "Photo-real talking head with deep bidirectional LSTM," in 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2015, pp. 4884-4888: IEEE.

J. Charles, D. Magee, and D. Hogg, "Virtual immortality: Reanimating characters from tv shows," in European Conference on Computer Vision, 2016, pp. 879-886: Springer.

K. M. Malik, H. Malik, and R. Baumann, "Towards vulnerability analysis of voice-driven interfaces and countermeasures for replay attacks," in 2019 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR), 2019, pp. 523-528: IEEE.

K. M. Malik, A. Javed, H. Malik, and A. Irtaza, "A lightweight replay detection framework for voice controlled iot devices," IEEE Journal of Selected Topics in Signal Processing, vol. 14, no. 5, pp. 982-996, 2020.

Y. Chen et al., "Sample efficient adaptive text-to-speech," arXiv preprint arXiv:1809.10460, 2018.

H. Lu et al., "One-Shot Voice Conversion with Global Speaker Embeddings," in INTERSPEECH, 2019, pp. 669-673.

S. Liu, J. Zhong, L. Sun, X. Wu, X. Liu, and H. Meng, "Voice Conversion Across Arbitrary Speakers Based on a Single Target-Speaker Utterance," in Interspeech, 2018, pp. 496-500.

J.-c. Chou, C.-c. Yeh, and H.-y. Lee, "One-shot voice conversion by separating speaker and content representations with instance normalization," arXiv preprint arXiv:.05742, 2019.

J.-c. Chou, C.-c. Yeh, H.-y. Lee, and L.-s. Lee, "Multi-target voice conversion without parallel data by adversarially learning disentangled audio representations," arXiv preprint arXiv:.02812, 2018.

H. Kameoka, T. Kaneko, K. Tanaka, and N. Hojo, "Stargan-vc: non-parallel many-to-many voice conversion using star generative adversarial networks," in 2018 IEEE Spoken Language Technology Workshop (SLT), 2018, pp. 266-273: IEEE.

L. Guarnera, O. Giudice, and S. Battiato, "DeepFake Detection by Analyzing Convolutional Traces," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2020, pp. 666-667.

G. Zhang, M. Kan, S. Shan, and X. Chen, "Generative adversarial network with spatial attention for face attribute editing," in Proceedings of the European conference on computer vision (ECCV), 2018, pp. 417-432.

Z. He, M. Kan, J. Zhang, and S. Shan, "PA-GAN: Progressive Attention Generative Adversarial Network for Facial Attribute Editing," arXiv preprint arXiv:2007.05892, 2020.

R. Wang et al., "Fakespotter: A simple yet robust baseline for spotting ai-synthesized fake faces," arXiv preprint arXiv:.06122, 2019.

O. M. Parkhi, A. Vedaldi, and A. Zisserman, "Deep face recognition," 2015.

B. Amos, B. Ludwiczuk, and M. Satyanarayanan, "Openface: A general-purpose face recognition library with mobile applications," CMU School of Computer Science, vol. 6, no. 2, 2016.

F. Schroff, D. Kalenichenko, and J. Philbin, "Facenet: A unified embedding for face recognition and clustering," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 815-823.

Chen, W.; Huang, H.; Peng, S.; Zhou, C.; Zhang, C. YOLO-face: A real-time face detector. Vis. Computer. 2021, 37, 805–813.

Jin, B.; Cruz, L.; Gonçalves, N. Deep facial diagnosis: Deep transfer learning from face recognition to facial diagnosis. IEEE Access 2020, 8, 123649–123661.

Narayan, Kartik, et al. "DF-Platter: Multi-Face Heterogeneous Deepfake Dataset." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023.

Ilyas, Hafsa, Ali Javed, and Khalid Mahmood Malik. "AVFakeNet: A unified end-to-end Dense Swin Transformer deep learning model for audio–visual deepfakes detection." Applied Soft Computing 136 (2023): 110124.

Mohiuddin, S., Sheikh, K.H., Malakar, S. et al. A hierarchical feature selection strategy for deepfake video detection. Neural Comput & Applic 35, 9363–9380 (2023). https://doi.org/10.1007/s00521-023-08201-z.

Salvi, D.; Liu, H.; Mandelli, S.; Bestagini, P.; Zhou, W.; Zhang, W.; Tubaro, S. A Robust Approach to Multimodal Deepfake Detection. J. Imaging 2023, 9, 122. https://doi.org/10.3390/jimaging9060122.

Downloads

Published

11.01.2024

How to Cite

Vyas, K. ., Pareek, P. ., Jayaswal, R. ., & Patil, S. . (2024). Analysing the landscape of Deep Fake Detection: A Survey. International Journal of Intelligent Systems and Applications in Engineering, 12(11s), 40–55. Retrieved from https://ijisae.org/index.php/IJISAE/article/view/4418

Issue

Section

Research Article