An Enhanced Classification Model for Detecting Deceptive Content in Social Media using Natural Language Processing Techniques

Authors

  • Mohamed Shenify Associate Professor, Department of Computer Science, Faculty of Computing & Information, Al-Baha University, Al-Baha, Kingdom of Saudi Arabia.

Keywords:

Social Media, Deceptive Content, Long Short-Term Memory (LSTM), Bidirectional Encoder Representations from Transformers (BERT), Natural Language Processing

Abstract

People may now express their thoughts on products, services, motion pictures and other media thanks to the rise of social networking sites. The emotion of the user is their viewpoint or viewpoint on any issue, event, occasion, or service. People's choices have always been influenced by their mental condition in general. Emotions have been extensively studied in natural language in recent years, but many issues need to be addressed. One of its most serious issues is a lack of exact categorization resources. Researchers discovered an unintentionally bias and unfairness generated by data sets used for training, which resulted in the inaccurate classification of harmful terms in context. Several ways to discover toxicity in text are evaluated and reported in this research, with the goal of improving the general standard of text categorization. Suggested methods included a deep learning model of Long- and Short-Term Memory (LSTM) with Glove word embedding and the LSTM with word embedding created by the representations of Bidirectional Encoder Representation from Transformers (BERT). The results showed that LSTM with BERT, as the word embedding attained a satisfactory precision of 94% and a F1 score of 0.89 in the binary categorization of comments (dangerous and nontoxic). The combined use of LSTM and BERT, as the outperformed both LSTM alone and LSTM with Multimodal word anchoring. This work attempts to overcome the challenge of accurately categorizing comments by relating models to bigger corpus of text (good-quality keyword anchoring) rather than training information alone.

Downloads

Download data is not yet available.

References

H. Aldabbas, A. Bajahzar, M. Alruily, A. A. Qureshi, R. M. Amir Latif, and M. G. Farhan, “Google play content scraping and knowledge engineering using natural language processing techniques with the analysis of user reviews,” Journal of Intelligent Systems, vol. 30, no. 1, pp. 192–208, 2020.

S. Soni and K. Roberts, “An evaluation of two commercial deep learning-based information retrieval systems for COVID-19 literature,” Journal of the American Medical Informatics Association, vol. 28, no. 1, pp. 132–137, 2021.

A. B. Tufail, I. Ullah, R. Khan et al., “Recognition of ziziphus lotus through aerial imaging and deep transfer learning approach,” Mobile Information Systems, vol. 2021, Article ID 4310321, 10 pages, 2021.

Y. Dai, J. Wu, Y. Fan et al., “MSEva: a musculoskeletal rehabilitation evaluation system based on EMG signals,” ACM Transactions on Sensor Networks, 2022.

R. Catelli, F. Gargiulo, V. Casola, G. De Pietro, H. Fujita, and M. Esposito, “A novel covid-19 data set and an effective deep learning approach for the de-identification of Italian medical records,” IEEE Access, vol. 9, Article ID 19097, 2021.

J. L. Izquierdo, J. Ancochea, J. B. Soriano, S. C. R. Group, and others, “Clinical characteristics and prognostic factors for intensive care unit admission of patients with COVID-19: retrospective study using machine learning and natural language processing,” Journal of Medical Internet Research, vol. 22, no. 10, Article ID e21801, 2020

R. U. Mustafa, P. M. Saqib Nawaz, and F. viger, “Early detection of controversial Urdu speeches from social media,” Data Sci. Pattern Recognit, vol. 1, no. 2, pp. 26–42, 2017.

K. Kumar, B. S. Harish, and H. K. Darshan, “Sentiment analysis on IMDb movie reviews using hybrid feature extraction method,” International Journal of Interactive Multimedia and Artificial Intelligence, vol. 5, no. 5, p. 109, 2019.

Wang, “Stock market forecasting with financial micro-blog based on sentiment and time series analysis,” Journal of Shanghai Jiaotong University, vol. 22, no. 2, pp. 173–179, 2017.

H. Xia, Y. Yang, X. Pan, Z. Zhang, and W. An, “Sentiment analysis for online reviews using conditional random fields and support vector machines,” Electronic Commerce Research, vol. 20, no. 2, pp. 343–360, 2020.

Sharifani, K., Amini, M., Akbari, Y., &AghajanzadehGodarzi, J. (2022). Operating Machine Learning across Natural Language Processing Techniques for Improvement of Fabricated News Model. International Journal of Science and Information System Research, 12(9), 20-44.

Albraikan, A. A., Maray, M., Alotaibi, F. A., Alnfiai, M. M., Kumar, A., & Sayed, A. (2023). Bio-Inspired Artificial Intelligence with Natural Language Processing Based on Deceptive Content Detection in Social Networking. Biomimetics, 8(6), 449.

Kaddoura, S., Chandrasekaran, G., Popescu, D. E., &Duraisamy, J. H. (2022). A systematic literature review on spam content detection and classification. PeerJ Computer Science, 8, e830.

Sousa, S., & Kern, R. (2023). How to keep text private? A systematic review of deep learning methods for privacy-preserving natural language processing. Artificial Intelligence Review, 56(2), 1427-1492.

Shu, K., Sliva, A., Wang, S., Tang, J., & Liu, H. (2017). Fake news detection on social media: A data mining perspective. ACM SIGKDD explorations newsletter, 19(1), 22-36.

Prabhu, S. N., Singh, A., Pasanna, A., & Ray, A. (2021, December). Track Mendacity Broadcast using Natural Language Processing. In 2021 International Conference on Forensics, Analytics, Big Data, Security (FABS) (Vol. 1, pp. 1-6). IEEE.

De Souza, J. V., Gomes Jr, J., Souza Filho, F. M. D., Oliveira Julio, A. M. D., & de Souza, J. F. (2020). A systematic mapping on automatic classification of fake news in social media. Social Network Analysis and Mining, 10, 1-21.

Ahmed, H., Traore, I., &Saad, S. (2017). Detection of online fake news using n-gram analysis and machine learning techniques. In Intelligent, Secure, and Dependable Systems in Distributed and Cloud Environments: First International Conference, ISDDC 2017, Vancouver, BC, Canada, October 26-28, 2017, Proceedings 1 (pp. 127-138). Springer International Publishing.

Thaher, T., Saheb, M., Turabieh, H., &Chantar, H. (2021). Intelligent detection of false information in Arabic tweets utilizing hybrid Harris hawks based feature selection and machine learning models. Symmetry, 13(4), 556.

Baccouche, A., Ahmed, S., Sierra-Sosa, D., &Elmaghraby, A. (2020). Malicious text identification: deep learning from public comments and emails. Information, 11(6), 312.

A. Krishna, V. Akhilesh, A. Aich, and C. Hegde, Sentiment Analysis of Restaurant Reviews Using Machine Learning Techniques, vol. 545, Springer, Singapore, 2019.

Schmidt and M. Wiegand, “A survey on hate speech detection using natural language processing,” in Proceedings of the Fifth International Workshop on Natural Language Processing for Social Media, vol. 2012, 10 pages, German, 2017.

P. Kirilenko, S. O. Stepchenkova, H. Kim, and X. Li, “Automated sentiment analysis in tourism: comparison of approaches,” Journal of Travel Research, vol. 57, no. 8, pp. 1012–1025, 2018.

W. H. Bangyal, J. Ahmad, H. Tayyab, and S. Pervaiz, “An improved bat algorithm based on novel initialization technique for global optimization problem,” International Journal of Advanced Computer Science and Applications, vol. 9, no. 7, pp. 158–166, 2018.

W. H. Bangyal, J. Ahmad, and H. T. Rauf, “Optimization of neural network using improved bat algorithm for data classification,” Journal of Medical Imaging and Health Informatics, vol. 9, no. 4, pp. 670–681, 2019.

F. Ahmadi, Sonia, G. Gupta, S. R. Zahra, P. Baglat, and P. Thakur, “Multi-factor biometric authentication approach for fog computing to ensure security perspective,” in Proceedings of the 2021 8th International Conference on Computing for Sustainable Global Development (INDIACom), pp. 172–176, IEEE, New Delhi, India, March 2021.

A. Onan, “Topic-enriched word embeddings for sarcasm identification,” Advances in Intelligent Systems and Computing, Springer, New York, NY, USA, 2019.

H. Bulut, S. Korukoğlu, and A. Onan, “Ensemble of keyword extraction methods and classifiers in text classification,” Expert Systems with Applications, vol. 57, pp. 232–247, 2016.

T. Bolukbasi, K. W. Chang, J. Y. Zou, V. Saligrama, and A. T. Kalai, “Man is to computer programmer as woman is to homemaker? debiasing word embeddings,” Advances in Neural Information Processing Systems, vol. 29, pp. 4349–4357, 2016.

S. Korukoğlu, “A feature selection model based on genetic rank aggregation for text sentiment classification,” Journal of Information Science, vol. 43, no. 1, pp. 25–38, 2016.

Downloads

Published

12.01.2024

How to Cite

Shenify , M. . (2024). An Enhanced Classification Model for Detecting Deceptive Content in Social Media using Natural Language Processing Techniques. International Journal of Intelligent Systems and Applications in Engineering, 12(12s), 662 –. Retrieved from https://ijisae.org/index.php/IJISAE/article/view/4550

Issue

Section

Research Article