Strengthening Fake News Detection: A Resilient Model with Tweet Truth

Archana  Nanade; Arun  Kumar; Ashutosh  Gupta; Arvind  Sharma

Authors

Archana Nanade Sir Padampat Singhania University, Udaipur, Rajasthan
Arun Kumar Sir Padampat Singhania University, Udaipur, Rajasthan
Ashutosh Gupta Sir Padampat Singhania University, Udaipur, Rajasthan
Arvind Sharma Sir Padampat Singhania University, Udaipur, Rajasthan

Keywords:

Fakes News, BERT, BERTweet, TweetTruth, Algorithm

Abstract

This study addresses the urgent need to combat misinformation by leveraging the capabilities of BERTweet for advanced fake news detection. The study begins with the pretraining of BERTweet on a diverse corpus, harnessing its ability to comprehend contextual relationships in social media texts. Fine-tuning follows using a meticulously curated dataset representing a variety of sources and deceptive writing styles commonly found in fake news. To enhance the model’s resilience, external knowledge sources such as fact-checking databases and reputable news outlets are integrated during both pretraining and fine-tuning. In addition, the study employs data augmentation techniques to address potential imbalances, exposing the model to a broader linguistic spectrum present in fake news on social media platforms.

Downloads

Download data is not yet available.

References

U. Kamath, K. L. Graham, and W. Emara, “Bidirectional encoder representations from Transformers (Bert),” Transformers for Machine Learning, pp. 43–70, 2022. doi:10.1201/9781003170082-3

Ahmed H, Traore I, Saad S (2017) Detection of online fake news using N-gram analysis and machine learning techniques. In: International conference on intelligent, secure, and dependable systems in distributed and cloud environments. Springer, Cham, pp 127–138.

Reema A, Kar AK, Vigneswara Ilavarasan P (2018) Detection of spammers in Twitter marketing: a hybrid approach using social media analytics and bio inspired computing. Information Systems Frontiers 20(3):515–530.

Shu K, Wang S, Liu H (2019) Beyond news contents: The role of social context for fake news detection. In: Proceedings of the twelfth ACM international conference on web search and data mining, pp 312– 320. ACM.

Munandar D, Arisal A, Riswantini D, Rozie AF (2018) Text classification for sentiment prediction of social media dataset using multichannel convolution neural network. In: 2018 International conference on computer, control, informatics and its applications (IC3INA). IEEE, pp 104–109.

Tenney I, Das D, Pavlick E (2019) BERT rediscovers the classical NLP pipeline. In: Proceedings of the 57th annual meeting of the association for computational linguistics.

Rohit Kumar Kaliyar;Anurag Goswami;Pratik Narang; (2021). FakeBERT: Fake news detection in social media with a BERT-based deep learning approach. Multimedia Tools and Applications.

Tenney I, Das D, Pavlick E (2019) BERT Rediscovers the Classical NLP Pipeline ICAART 2022 - 14th International Conference on Agents and Artificial Intelligence.

Walker, Mason; Matsa, Katerina Eva (2021-09-20). "News Consumption Across Social Media in 2021". Pew Research Center's Journalism Project. Retrieved 2021-11-03.

Castells, Manuel (2004). The Network Society. doi:10.4337/9781845421663. ISBN 9781845421663.

NCRB data: 214% rise in cases relating to fake news, rumours | India News,The Indian Express

https://medium.com/swlh/fake-news-detection-using-machine-learning-69ff9050351f.

A. Gupta and P. Kumaraguru, “Credibility ranking of tweets during high impact events,” in Proceedings of the 1st Workshop on Privacy and Security in Online Social Media, ser. PSOSM ’12. New York, NY, USA: ACM, 2012, pp. 2:2–2:8. [Online]. Available: http://doi.acm.org/10.1145/2185354.2185356

[S. Mohd Shariff, X. Zhang, and M. Sanderson, “User perception of information credibility of news on twitter,” in Proceedings of the 36th European Conference on IR Research on Advances in Information Retrieval - Volume 8416, ser. ECIR 2014. New York, NY, USA: Springer-Verlag New York, Inc., 2014, pp. 513–518. [Online]. Available: http://dx.doi.org/10.1007/978-3-319-06028-6 50

S. Sikdar, S. Adali, M. Amin, T. Abdelzaher, K. Chan, J. H. Cho, B. Kang, and J. O’Donovan, “Finding true and credible information on twitter,” in 17th International Conference on Information Fusion (FUSION), July 2014, pp. 1–8.

C. Castillo, M. Mendoza, and B. Poblete, “Information credibility on twitter,” in Proceedings of the 20th International Conference on World Wide Web, ser. WWW ’11. New York, NY, USA: ACM, 2011, pp. 675– 684. [Online]. Available: http://doi.acm.org/10.1145/1963405.1963500.

M. Mendoza, B. Poblete, and C. Castillo, “Twitter Under Crisis: Can We Trust What We RT?” in Proceedings of the First Workshop on Social Media Analytics, ser. SOMA ’10. New York, NY, USA: ACM, 2010, pp. 71–79. [Online]. Available: http://doi.acm.org/10.1145/1964858.196486

Z.-H. Zhou, “A brief introduction to weakly supervised learning

A. Zubiaga, G. W. S. Hoi, M. Liakata, R. Procter, and P. Tolmie, “Analysing How People Orient to and Spread Rumours in Social Media by Looking at Conversational Threads,” PLoS ONE, pp. 1–33, 2015. [Online]. Available: http://arxiv.org/abs/1511.07487

T. Mitra and E. Gilbert, “CREDBANK: A Large-Scale Social Media Corpus With Associated Credibility Annotations,” International AAAI Conference on Web and Social Media (ICWSM), 2015. [Online] Available:http://www.aaai.org/ocs/index.php/ICWSM/ICWSM15/paper/ view/10582

M. Young, The Technical Writer’s Handbook. Mill Valley, CA: Uni C. Silverman, L. Strapagiel, H. Shaban, E. Hall, and J. Singer-Vine, “Hyperpartisan Facebook Pages Are Publishing False And Misleading Information At An Alarming Rate,” oct 2016. [Online]. Available: https://www.buzzfeed.com/craigsilverman/partisan-fb-pages-analysiversity Science, 1989.

Nanade, Archana, and Arun Kumar. "Combating Fake News on Twitter: A Machine Learning Approach for Detection and Classification of Fake Tweets." International Journal of Intelligent Systems and Applications in Engineering 12.1 (2024): 424-436.

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” in Advances in Neural Information Processing Systems, 2017, pp. 5998–6008.

Nanade, Archana and Bhatnagar, Divya and Kumar, Arun, A Survey on Information Retrieval System for Authenticating a Tweet (March 14, 2019). Proceedings of International Conference on Sustainable Computing in Science, Technology and Management (SUSCOM), Amity University Rajasthan, Jaipur - India, February 26-28, 2019, Available at SSRN: https://ssrn.com/abstract=3352356

Gandhi, H., Bachwani, R., Nanade, A. (2023). Detecting Toxic Comments Using FastText, CNN, and LSTM Models. In: Singh, M., Tyagi, V., Gupta, P., Flusser, J., Ören, T. (eds) Advances in Computing and Data Sciences. ICACDS 2023. Communications in Computer and Information Science, vol 1848. Springer, Cham. https://doi.org/10.1007/978-3-031-37940-6_20

GitHub - scikit-learn/scikit-learn: scikit-learn: machine learning in Python

https://github.com/VinAIResearch/BERTweet

https://www.kaggle.com/datasets/vstepanenko/disaster-tweets

https://www.internetlivestats.com/twitter-statistics/