A Comparative Analysis of Machine Learning Models Used for Hate Speech (Hs) Detection of Odia Language

Authors

  • Aloka Natha C V Raman Global University, Bhubaneswar – 752054, Odisha
  • Bichitrananda Behera C V Raman Global University, Bhubaneswar – 752054, Odisha
  • Debaswapna Mishra C V Raman Global University, Bhubaneswar – 752054, Odisha
  • Saumya Ranjan Sahu C V Raman Global University, Bhubaneswar – 752054, Odisha
  • Subhasis Mohapatra Sri Sri University, Cuttack- 754006, Odisha

Keywords:

component, formatting, style, styling, recall, precision, f1 score, accuracy

Abstract

Social media has changed the way of communication and interaction around the world. However, hostile content can be readily exposed by the internet community due to the growth and obscurity of online media. The persons who are spreading unpleasant or hateful posts to a group of people based on their gender, political stance, color, ethnicity, or any other characteristic must be identified, either individually or as a collective. Odia, one of the six classical Indo-Aryan languages of India, is spoken by 82% of Odisha's native speakers and the remaining 18% of people who live in the states of West Bengal, Jharkhand, Andhra Pradesh, and Chhattisgarh. Due to a lack of resources, relatively little research has been done in the literature on the detection of hate speech (HS) in the Odia language. This paper's primary goal is to compile posts and comments from social media pages into an HS dataset for the Odia language. There are two categories for this dataset: HS and non-HS. Feature extraction methods and machine learning classification algorithms can be used with this dataset to identify the HS patterns from a certain social media post. Here, a comparative analysis is carried out by using this generated dataset of HS to train several machine learning models. The models' performance is compared using several metrics, including F1-score, accuracy, precision, and recall.

Downloads

Download data is not yet available.

References

S. Sahoo, C. S. Panda, and S. Patnaik, “Part of Speech Tagging in Odia Using Support Vector Machine,” Procedia Computer Science, vol. 48, pp. 507–512, Jan.2015, doi:10.1016/. procs.2015.04.127

S. K. Mohapatra, S. Prasad, D. K. Bebarta, T. K. Das, K. Srinivasan, and Y.-C. Hu, "Automatic Hate Speech Detection in English-Odia Code Mixed Social Media Data Using Machine Learning Techniques," Applied Sciences, Vol. 11, no.18,p. 8575, Aug.2021,doi:10.3390/app11188575.

A. M. Ishmam and S. Sharmin, "Hateful Speech Detection in Public Facebook Pages for the Bengali Language, "2019 18th IEEE International Conference on Machine Learning and Application (ICMLA), Boca Raton, FL, USA,2019, pp.555-560, doi: 10.1109//ICMLA.2019.00104

K. Nugroho et al., “Improving Random Forest Method to Detect Hate Speech and Offensive Word,” 2019 International Conference on Information and Communication Technology (ICOIACT), Yogyakarta, Indonesia,2019, pp.514-518, doi: 10.1109/ICOI ACT46704.2019.8938451.

N. Atashfaraz “SHORT-TERM WIND SPEED FORECASTING USING DEEP VARIATIONAL LSTM,” Dec.01.2022 http://82.194.3.83:12123/ jspui/handel/123456789/128

A.A.-H and H. Ai-Dossari, “DETECTION OF HATE SPEECH IN SOCIAL NETWORKS: A SURVEY ON MULTINOUAL CORPUS,” Computer Science & Information Technology (CS & IT). Feb. 2019. Doi: 10.512 /csit.2019.90208.

B. Behera, G. Kumaravelan, and p. Kumar “Performance Evolution of Deep Learning Algorithm in Biomedical Document Classification,” International Conference on Advanced Computing. Dec 2019, Doi: 10.1109/icoac48765.2019.246843

Saroj, A.; Pal, S. An Indian language social media collection for hate and offensive speech. In Proceedings of the Workshop on Resources and Techniques for User and Author Profiling in Abusive Language, Marseille, France, 11–16 May 2020; pp. 2–8.

Mossie, Z.; Wang, J.H. Social network hate speech detection for Amharic language. In Proceedings of the 6th International Conference on Computer Science and Information Technology, Copenhagen, Denmark, 28–29 April 2018; pp. 41–55.

Ibrohim, M.O.; Budi, I. A dataset and preliminaries study for abusive language detection in Indonesian social media. Procedia Comp. Sci. 2018, 135, 222–229. [CrossRef]

Davidson, T.; Warmsley, D.; Macy, M.; Weber, I. Automated hate speech detection and the problem of offensive language. In Proceedings of the Eleventh International AAAI Conference on Web and Social Media, Montreal, QC, Canada, 15–18 May 2017.

Gambäck, B.; Sikdar, U.K. Using convolutional neural networks to classify hate-speech. In Proceedings of the First Workshop on Abusive Language Online, Vancouver, BC, Canada, 30 July–4 August 2017; pp. 85–90.

Benikova, D.; Wojatzki, M.; Zesch, T. Examining the Impact of Implicitness on the Perception of Hate Speech. In International Conference of the German Society for Computational Linguistics and Language Technology; Springer: Cham, Switzerland, 2017; pp. 171–179.

Tripathy, S. S., & Behera, B. (2023). Performance Evaluation of Machine Learning Algorithms for Intrusion Detection System. Journal of Biomechanical Science and Engineering, 621-640.

Pattnaik, Sagarika & Nayak, Ajit. (2020). A SIMPLE AND EFFICIENT TEXT SUMMARIZATION MODEL FOR ODIA TEXT DOCUMENTS. Indian Journal of Computer Science and Engineering. 11. 825-834. 10.21817/indjcse/2020/v11i6/201106132.

Mishra, B. K., & Sahoo, R. (2018). A hybrid knowledge mining approach to develop a system framework for Odia language text processing. Materials Today: Proceedings, 5(1), 1335-1340.

M.K, Jena & Mohanty, Sanghamitra. (2019). Contextual Opinion mining in online Odia Text using 'Support Vector Machine. 10.13140/RG.2.2.35968.35845.

Dey, G & Maringanti, H. (2023). SOLMAT: A Neoteric Contextual Model for Odia Language Understanding. Journal of Physics: Conference Series. 2571. 012011. 10.1088/1742-6596/2571/1/012011.

Mohanty, G., Kannan, A., & Mamidi, R. (2017, September). Building a sentiwordnet for Odia. In Proceedings of the 8th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis (pp. 143-148).

Mohanty, G., Mishra, P., & Mamidi, R. (2020, May). Annotated corpus for sentiment analysis in Odia language. In Proceedings of the Twelfth Language Resources and Evaluation Conference (pp. 2788-2795).

U. Parida, M. Nayak and A. K. Nayak, "Ranking of Odia Text Document Relevant to User Query Using Vector Space Model," 2019 International Conference on Applied Machine Learning (ICAML), Bhubaneswar, India, 2019, pp. 165-169, doi: 10.1109/ICAML48257.2019.00039.

S. Pattnaik and A. K. Nayak, "Summarization of Odia Text Document Using Cosine Similarity and Clustering," 2019 International Conference on Applied Machine Learning (ICAML), Bhubaneswar, India, 2019, pp. 143-146, doi: 10.1109/ICAML48257.2019.00035.

Jena, M. K., & Mohanty, S. (2020). Predicting the Impact of Odia Newspaper Articles on Public Opinion. In Progress in Computing, Analytics and Networking: Proceedings of ICCAN 2019 (pp. 265-272). Springer Singapore.

Das, Bishwa & Sahoo, Rekhanjali & Singh, Dilip & Bhoi, Prakash Chandra. (2023). Odia Text Classification for Sentiment Analysis using K Nearest Neighbor.

Fauzi, M.A.; Yuniarti, A. Ensemble method for Indonesian Twitter hate speech detection. Indones. J. Electr. Eng. Comput. Sci. 2018, 11, 294–299. [CrossRef]

Fiok, K.; Karwowski, W.; Gutierrez, E.; Liciaga, T.; Belmonte, A.; Capobianco, R. Automated Classification of Evidence of Respect in the Communication through Twitter. Appl. Sci. 2021, 11, 1294. [CrossRef]

Alshalan, R.; Al-Khalifa, H. A Deep Learning Approach for Automatic Hate Speech Detection in the Saudi Twittersphere. Appl.Sci. 2020, 10, 8614. [CrossRef]

Pereira-Kohatsu, J.C.; Quijano-Sánchez, L.; Liberatore, F.; Camacho-Collados, M. Detecting and monitoring hate speech in Twitter. Sensors 2019, 19, 4654. [CrossRef]

Sahoo, Rekhanjali & Mishra, Brojo & Das, Bishwa. (2022). Odia Text Classification Using Naïve Bayes Algorithm: An Empirical Study. ECS Transactions. 107. 8175-8180. 10.1149/10701.8175ecst.

Poletto, F.; Basile, V.; Sanguinetti, M.; Bosco, C.; Patti, V. Resources and benchmark corpora for hate speech detection: A systematic review. Lang. Resour. Eval. 2021, 55, 477–523. [Google Scholar] [CrossRef]

Sánchez-Compaña, M.T.; Sánchez-Cruzado, C.; García-Ruiz, C.R. An interdisciplinary scientific and mathematics education, addressing relevant social problems such as sexist hate speech. Information 2020, 11, 543. [Google Scholar] [CrossRef]

Sanoussi, M.S.A.; Xiaohua, C.; Agordzo, G.K.; Guindo, M.L.; al Omari, A.M.M.A.; Issa, B.M. Detection of Hate Speech Texts Using Machine Learning Algorithm. In Proceedings of the 2022 IEEE 12th Annual Computing and Communication Workshop and Conference (CCWC), Las Vegas, NV, USA, 26–29 January 2022; pp. 266–273. [Google Scholar] [CrossRef]

Ayo, F.E.; Folorunso, O.; Ibharalu, F.T.; Osinuga, I.A. Machine learning techniques for hate speech classification of Twitter data: State-of-The-Art, future challenges and research directions. Comp. Sci. Rev. 2020, 38, 100311. [Google Scholar] [CrossRef]

Corazza, M.; Menini, S.; Cabrio, E.; Tonelli, S.; Villata, S. A Multilingual Evaluation for Online Hate Speech Detection. ACM Trans. Internet Technol. 2020, 20. [Google Scholar] [CrossRef] [Green Version]

Das, Bishwa & Maringanti, Hima & Dash, Niladri. (2022). Word Alignment in Bilingual Text for Bangla to Odia Machine Translation.

Jahan, M.S.; Oussalah, M. A systematic review of Hate Speech automatic detection using Natural Language Processing. arXiv 2021, arXiv:2106.00742. [Google Scholar]

Qureshi, K.A.; Sabih, M. Un-Compromised Credibility: Social Media Based Multi-Class Hate Speech Classification for Text. IEEE Access 2021, 9, 109465–109477. [Google Scholar] [CrossRef]

Siddiqua, U.A.; Chy, A.N.; Aono, M. KDEHatEval at SemEval-2019 Task 5: A Neural Network Model for Detecting Hate Speech in Twitter. Proceedings of the 13th International Workshop on Semantic Evaluation, Minneapolis, MN, USA, 6–7 June 2019; pp. 365–370. [Google Scholar] [CrossRef]

Sachdeva, J.; Chaudhary, K.K.; Madaan, H.; Meel, P. Text-Based Hate-Speech Analysis. In Proceedings of the International Conference on Artificial Intelligence and Smart Systems, ICAIS, Tamilnadu, India, 25–27 March 2021; pp. 661–668. [Google Scholar] [CrossRef]

Tontodimamma, A.; Nissi, E.; Sarra, A.; Fontanella, L. Thirty years of research into hate speech: Topics of interest and their evolution. Scientometrics 2021, 126, 157–179. [Google Scholar] [CrossRef]

Shibly, F.H.A.; Sharma, U.; Naleer, H.M.M. Classifying and Measuring Hate Speech in Twitter Using Topic Classifier of Sentiment Analysis; Springer: Singapore, 2021; Volume 1165. [Google Scholar] [CrossRef]

Ullmann, S.; Tomalin, M. Quarantining online hate speech: Technical and ethical perspectives. Ethics Inf. Technol. 2020, 22, 69–80. [Google Scholar] [CrossRef][Green Version]

Mathew, B.; Saha, P.; Yimam, S.M.; Biemann, C.; Goyal, P.; Mukherjee, A. HateXplain: A Benchmark Dataset for Explainable Hate Speech Detection. Proc. AAAI Conf. Artif. Intell. 2021, 35, 14867–14875. [Google Scholar]

Culpeper, J. Impoliteness and hate speech: Compare and Contrast. J. Pragmat. 2021, 179, 4–11. [Google Scholar] [CrossRef]

Watanabe, H.; Bouazizi, M.; Ohtsuki, T. Hate speech on Twitter: A pragmatic approach to collect hateful and offensive expressions and perform hate speech detection. IEEE Access 2018, 6, 13825–13835. [CrossRef]

Downloads

Published

23.02.2024

How to Cite

Natha, A. ., Behera, B. ., Mishra, D. ., Sahu, S. R. ., & Mohapatra, S. . (2024). A Comparative Analysis of Machine Learning Models Used for Hate Speech (Hs) Detection of Odia Language . International Journal of Intelligent Systems and Applications in Engineering, 12(16s), 464–476. Retrieved from https://ijisae.org/index.php/IJISAE/article/view/4859

Issue

Section

Research Article