Improving the Identification of Hate Speech in Arabic Social Media Content Using Emojis Translation

Authors

  • Khadidja Zerrouki, Nadjia Benblidia, Omar Boussaid

Keywords:

Hate speech; Offensive; Arabic Text pre-processing; Emojis; Deep Learning; Bi-LSTM

Abstract

The presence of hate speech on the internet substantially threatens the well-being and safety of individuals using online platforms, hence requiring sophisticated approaches to detect and maintain a constructive atmosphere within social networks. However, extracting information from Arabic text posted on social networking platforms poses considerable challenges. This research paper presents a novel approach that utilizes artificial intelligence techniques to detect instances of hate speech in Arabic-language content disseminated through social media platforms. A supervised deep learning model is developed using the Bi-LSTM (Bidirectional Long Short-Term Memory) architecture and employing Arabic text pre-processing techniques to improve the model's overall performance. The model has undergone training and evaluation using a compilation of four public Arabic datasets containing instances of hate speech, which have been sourced from various social media platforms. The empirical results illustrate that the deep learning model proposed in this study demonstrates exceptional precision, with an accuracy rate of 98.4. The model demonstrates robust generalization skills, efficiently identifying instances of hate speech in Arabic text from several sources with varying degrees of complexity. Moreover, our study provides empirical evidence to support the claim that pre-processing emojis rather than removing them improves the effectiveness of deep learning models in detecting hate speech in Arabic text on social media.

Downloads

Download data is not yet available.

References

Brown A. What is hate speech? Part 1: The Myth of Hate. Law and Philos 2017; 36: 419–468.

Poletto F, Basile V, Sanguinetti M, et al. Resources and benchmark corpora for hate speech detection: a systematic review. Lang Resources & Evaluation 2021; 55: 477–523.

Al-Dossari AA-H and H. Detection of Hate Speech in Social Networks: A Survey on Multilingual Corpus. Computer Science & Information Technology (CS & IT) 2019; 9: 83.

Uysal AK, Gunal S. The impact of pre-processing on text classification. Information Processing & Management 2014; 50: 104–112.

AlOtaibi S, Khan MB. Sentiment Analysis Challenges of Informal Arabic Language. International Journal of Advanced Computer Science and Applications (ijacsa); 8. Epub ahead of print 28 2017. DOI: 10.14569/IJACSA.2017.080237

Salloum SA, AlHamad AQ, Al-Emran M, et al. A Survey of Arabic Text Mining. In: Shaalan K, Hassanien AE, Tolba F (eds) Intelligent Natural Language Processing: Trends and Applications. Cham: Springer International Publishing, pp. 417–431.

Mubarak H, Darwish K, Magdy W, et al. Overview of OSACT4 Arabic Offensive Language Detection Shared Task. In: Proceedings of the 4th Workshop on Open-Source Arabic Corpora and Processing Tools, with a Shared Task on Offensive Language Detection. Marseille, France: European Language Resource Association, pp. 48–52.

Albadi N, Kurdi M, Mishra S. Are they Our Brothers? Analysis and Detection of Religious Hate Speech in the Arabic Twittersphere. In: 2018 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM). 2018, pp. 69–76.

Soliman AB, Eissa K, El-Beltagy SR. AraVec: A set of Arabic Word Embedding Models for use in Arabic NLP. Procedia Computer Science 2017; 117: 256–265.

Aluru SS, Mathew B, Saha P, et al. Deep Learning Models for Multilingual Hate Speech Detection. Epub ahead of print 9 December 2020. DOI: 10.48550/arXiv.2004.06465.

Duquenne P-A, Gong H, Schwenk H. Multimodal and Multilingual Embeddings for Large-Scale Speech Mining. In: Advances in Neural Information Processing Systems. Curran Associates, Inc., pp. 15748–15761.

Aldjanabi W, Dahou A, Al-qaness MAA, et al. Arabic Offensive and Hate Speech Detection Using a Cross-Corpora Multi-Task Learning Model. Informatics 2021; 8: 69.

AlKhamissi B, Diab M. Meta AI at Arabic Hate Speech 2022: MultiTask Learning with Self-Correction for Hate Speech Classification. Epub ahead of print 16 May 2022. DOI: 10.48550/arXiv.2205.07960.

Bjerva J. One Model to Rule them all: Multitask and Multilingual Modelling for Lexical Analysis. Epub ahead of print 3 November 2017. DOI: 10.48550/arXiv.1711.01100.

Shapiro A, Khalafallah A, Torki M. AlexU-AIC at Arabic Hate Speech 2022: Contrast to Classify. In: Proceedinsg of the 5th Workshop on Open-Source Arabic Corpora and Processing Tools with Shared Tasks on Qur’an QA and Fine-Grained Hate Speech Detection. Marseille, France: European Language Resources Association, pp. 200–208.

[16] Althobaiti MJ. BERT-based Approach to Arabic Hate Speech and Offensive Language Detection in Twitter: Exploiting Emojis and Sentiment Analysis. International Journal of Advanced Computer Science and Applications (IJACSA); 13. Epub ahead of print 40/31 2022. DOI: 10.14569/IJACSA.2022.01305109.

[17] Rex R. Pre-processing Techniques for Text Mining, https://www.academia.edu/35015140/Pre-processing_Techniques_for_Text_Mining (accessed 14 February 2023).

Sarang P. Natural Language Understanding. In: Sarang P (ed) Artificial Neural Networks with TensorFlow 2: ANN Architecture Machine Learning Projects. Berkeley, CA: Apress, pp. 405–469.

Sunagar P, Kanavalli A, Shetty ND. Feature Extraction And Selection Techniques For Text Classification: A Survey. International Journal of Advanced Research in Engineering and Technology (IJARET. Epub ahead of print December 2020. DOI: 10.34218/IJARET.11.12.2020.268.

Pradnya K, Manisha M. A Survey on Feature Selection Techniques and Classification Algorithms for Efficient Text Classification. IJSR 2016; 5: 1267–1275.

Salton G, Buckley C. Term-weighting approaches in automatic text retrieval. Information Processing & Management 1988; 24: 513–523.

Zhang Y, Jin R, Zhou Z-H. Understanding bag-of-words model: a statistical framework. Int J Mach Learn & Cyber 2010; 1: 43–52.

Li S, Gong B. Word embedding and text classification based on deep learning methods. MATEC Web Conf 2021; 336: 06022.

Liu H, Cocea M. Traditional Machine Learning. In: Liu H, Cocea M (eds) Granular Computing Based Machine Learning: A Big Data Processing Approach. Cham: Springer International Publishing, pp. 11–22.

Kowsher Md, Tahabilder A, Islam Sanjid MdZ, et al. LSTM-ANN & BiLSTM-ANN: Hybrid deep learning models for enhanced classification accuracy. Procedia Computer Science 2021; 193: 131–140.

Mulki H, Haddad H, Bechikh Ali C, et al. L-HSAB: A Levantine Twitter Dataset for Hate Speech and Abusive Language. In: Proceedings of the Third Workshop on Abusive Language Online. Florence, Italy: Association for Computational Linguistics, pp. 111–118.

Mubarak H, Darwish K, Magdy W. Abusive Language Detection on Arabic Social Media. In: Proceedings of the First Workshop on Abusive Language Online. Vancouver, BC, Canada: Association for Computational Linguistics, pp. 52–56.

Alakrot A, Murray L, Nikolov NS. Dataset Construction for the Detection of Anti-Social Behaviour in Online Communication in Arabic. Procedia Computer Science 2018; 142: 174–181.

Downloads

Published

12.06.2024

How to Cite

Khadidja Zerrouki. (2024). Improving the Identification of Hate Speech in Arabic Social Media Content Using Emojis Translation. International Journal of Intelligent Systems and Applications in Engineering, 12(4), 3791 –. Retrieved from https://ijisae.org/index.php/IJISAE/article/view/6927

Issue

Section

Research Article