Combining Roberta Pre-Trained Language Model and NMF Topic Modeling Technique to Learn from Customer Reviews Analysis

Authors

  • Doae Mensouri PhD student, FST of Tangier Abdelmalek Essaadi University Tetouan, Morocco
  • Abdellah Azmani Professor, FST of Tangier Abdelmalek Essaadi University Tetouan, Morocco
  • Monir Azmani Professor, FST of Tangier Abdelmalek Essaadi University Tetouan, Morocco

Keywords:

Customer reviews, Deep learning methods, Pre-trained language model, Sentiment analysis, Topic modeling NMF

Abstract

In the past few years, more and more researchers have focused on the field of natural language processing and, more specifically, on aspect-level sentiment analysis. Sentiment analysis is often used to analyse people's opinions, or feelings about different entities like products or services. Given the immense amount of data generated daily in various forms on the web, sentiment analysis has become one of the most active areas of research today. In turn, online user reviews are considered a powerful marketing tool and have attracted widespread attention from marketers and academics. This motivates the current study, which focuses on sentiment analysis using four machine learning models, namely recurrent neural networks (LSTM, GRU, Bi-LSTM), and a pre-trained language model (Roberta). The models are trained to categorize customer reviews on online platforms as positive or negative. Then the best model that shows the best results is selected by evaluating each of them based on accuracy, precision, recall, and F1 Score. Finally, a topic modeling technique is used to reveal various topics present in the data and determine what’s pushed the customer to give such a review, as well as provide suppliers with the right decision for each scenario case. Although the approach proposed in this study is applied to analyse the opinions of customers towards the products of a marketplace and reveals different topics present in it, it can be used in any field to know the writer's point of view on, e.g., government policy, individuals, brands, etc.

Downloads

Download data is not yet available.

References

Q. Wang, W. Zhang, J. Li, F. Mai, and Z. Ma, “Effect of online review sentiment on product sales: The moderating role of review credibility perception,” Computers in Human Behavior, vol. 133, p. 107272, Aug. 2022, doi: 10.1016/j.chb.2022.107272.

L. Yan and X. Wang, “Why posters contribute different content in their positive online reviews: A social information-processing perspective,” Computers in Human Behavior, vol. 82, pp. 199–216, May 2018, doi: 10.1016/j.chb.2018.01.009.

K. Zhao, A. C. Stylianou, and Y. Zheng, “Sources and impacts of social influence from online anonymous user reviews,” Inf. Manag., 2018, doi: 10.1016/j.im.2017.03.006.

T. Collinger, “How Online Reviews Influence Sales Medill Spiegel Research Center,” Medill Spiegel Research Center, 2017. https://spiegel.medill.northwe stern.edu/how-online-reviews-influence-sales/ (accessed Jun. 13, 2022).

R. Murphy, “Local Consumer Review Survey 2022: Customer Reviews and Behavior,” BrightLocal, 2019. https://www.brightlocal.com/rese arch/local-consumer-review-survey/ (accessed Jun. 13, 2022).

“Proven Power of Ratings & Reviews,” PowerReviews, Jan. 18, 2015. https://www.powerreviews.com/insights/proven-power-of-ratings-and-reviews/ (accessed Jun. 13, 2022).

T. Junaid et al., “A comparative analysis of transformer based models for figurative language classification,” Computers and Electrical Engineering, vol. 101, p. 108051, Jul. 2022, doi: 10.1016/j.compeleceng.2022.108051.

M. Sharma, I. Kandasamy, and V. Kandasamy, “Deep Learning for predicting neutralities in Offensive Language Identification Dataset,” Expert Systems with Applications, vol. 185, p. 115458, Dec. 2021, doi: 10.1016/j.eswa.2021.115458.

Z. A. Guven and M. O. Unalir, “Natural language based analysis of SQuAD: An analytical approach for BERT,” Expert Systems with Applications, vol. 195, p. 116592, Jun. 2022, doi: 10.1016/j.eswa.2022.116592.

K. R. Chowdhary, “Natural Language Processing,” in Fundamentals of Artificial Intelligence, K. R. Chowdhary, Ed. New Delhi: Springer India, 2020, pp. 603–649. doi: 10.1007/978-81-322-3972-7_19.

M. Luengo and D. García-Marín, “The performance of truth: politicians, fact-checking journalism, and the struggle to tackle COVID-19 misinformation,” Am J Cult Sociol, vol. 8, no. 3, pp. 405–427, Dec. 2020, doi: 10.1057/s41290-020-00115-w.

L. Han, Y. Deng, H. Chen, G. Wei, KaiSheng, and J. Shi, “A robust VRF fault diagnosis method based on ensemble BiLSTM with attention mechanism: Considering uncertainties and generalization,” Energy and Buildings, vol. 269, p. 112243, Aug. 2022, doi: 10.1016/j.enbuild.2022.112243.

S. Hochreiter and J. Schmidhuber, “Long Short-Term Memory,” Neural Computation, vol. 9, no. 8, pp. 1735–1780, Nov. 1997, doi: 10.1162/neco.1997.9.8.1735.

N. Rai, D. Kumar, N. Kaushik, C. Raj, and A. Ali, “Fake News Classification using transformer based enhanced LSTM and BERT,” International Journal of Cognitive Computing in Engineering, vol. 3, pp. 98–105, Jun. 2022, doi: 10.1016/j.ijcce.2022.03.003.

T.-Y. Kim and S.-B. Cho, “Web traffic anomaly detection using C-LSTM neural networks,” Expert Systems with Applications, vol. 106, pp. 66–76, Sep. 2018, doi: 10.1016/j.eswa.2018.04.004.

J. Chung, C. Gulcehre, K. Cho, and Y. Bengio, “Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling.” arXiv, Dec. 11, 2014. doi: 10.48550/arXiv.1412.3555.

K. E. ArunKumar, D. V. Kalaga, Ch. Mohan Sai Kumar, M. Kawaji, and T. M. Brenza, “Comparative analysis of Gated Recurrent Units (GRU), long Short-Term memory (LSTM) cells, autoregressive Integrated moving average (ARIMA), seasonal autoregressive Integrated moving average (SARIMA) for forecasting COVID-19 trends,” Alexandria Engineering Journal, vol. 61, no. 10, pp. 7585–7603, Oct. 2022, doi: 10.1016/j.aej.2022.01.011.

A. Graves and J. Schmidhuber, “Framewise phoneme classification with bidirectional LSTM and other neural network architectures,” Neural Networks, vol. 18, no. 5–6, pp. 602–610, Jul. 2005, doi: 10.1016/j.neunet.2005.06.042.

M. Yang and J. Wang, “Adaptability of Financial Time Series Prediction Based on BiLSTM,” Procedia Computer Science, vol. 199, pp. 18–25, Jan. 2022, doi: 10.1016/j.procs.2022.01.003.

S. Siami-Namini, N. Tavakoli, and A. S. Namin, “A Comparative Analysis of Forecasting Financial Time Series Using ARIMA, LSTM, and BiLSTM.” arXiv, Nov. 21, 2019. doi: 10.48550/arXiv.1911.09512.

S. Pan, Q. Liao, and Y. Liang, “Multivariable sales prediction for filling stations via GA improved BiLSTM,” Petroleum Science, May 2022, doi: 10.1016/j.petsci.2022.05.005.

X. Qiu, T. Sun, Y. Xu, Y. Shao, N. Dai, and X. Huang, “Pre-trained models for natural language processing: A survey,” Sci. China Technol. Sci., vol. 63, no. 10, pp. 1872–1897, Oct. 2020, doi: 10.1007/s11431-020-1647-3.

Y. Liu et al., “RoBERTa: A Robustly Optimized BERT Pretraining Approach,” arXiv, arXiv:1907.11692, Jul. 2019. doi: 10.48550/arXiv.1907.11692.

J. Briskilal and C. N. Subalalitha, “An ensemble model for classifying idioms and literal texts using BERT and RoBERTa,” Information Processing & Management, vol. 59, no. 1, p. 102756, Jan. 2022, doi: 10.1016/j.ipm.2021.102756.

D. Korenčić, S. Ristov, J. Repar, and J. Šnajder, “A Topic Coverage Approach to Evaluation of Topic Models,” IEEE Access, 2021, doi: 10.1109/ACCESS.2021.3109425.

A. F. Pathan and C. Prakash, “Unsupervised Aspect Extraction Algorithm for opinion mining using topic modeling,” Global Transitions Proceedings, vol. 2, no. 2, pp. 492–499, Nov. 2021, doi: 10.1016/j.gltp.2021.08.005.

L. Zhu and S. W. Cunningham, “Unveiling the knowledge structure of technological forecasting and social change (1969–2020) through an NMF-based hierarchical topic model,” Technological Forecasting and Social Change, vol. 174, p. 121277, Jan. 2022, doi: 10.1016/j.techfore.2021.121277.

A. Abuzayed and H. Al-Khalifa, “BERT for Arabic Topic Modeling: An Experimental Study on BERTopic Technique,” Procedia Computer Science, vol. 189, pp. 191–194, Jan. 2021, doi: 10.1016/j.procs.2021.05.096.

A. Nugumanova, D. Akhmed-Zaki, M. Mansurova, Y. Baiburin, and A. Maulit, “NMF-based approach to automatic term extraction,” Expert Systems with Applications, vol. 199, p. 117179, Aug. 2022, doi: 10.1016/j.eswa.2022.117179.

D. D. Lee and H. S. Seung, “Algorithms for non-negative matrix factorization,” in Advances in Neural Information Processing Systems 13 - Proceedings of the 2000 Conference, NIPS 2000, 2001. Accessed: Jun. 20, 2022. [Online]. Available: https://www.researchwithnj.com/en/publications/algorithms-for-non-negative-matrix-factorization

S. Arora, R. Ge, and A. Moitra, “Learning Topic Models -- Going beyond SVD,” in Proceedings of the 2012 IEEE 53rd Annual Symposium on Foundations of Computer Science, USA, Oct. 2012, pp. 1–10. doi: 10.1109/FOCS.2012.49.

K. Masuda, T. Matsuzaki, and J. Tsujii, “Semantic Search based on the Online Integration of NLP Techniques,” Procedia - Social and Behavioral Sciences, vol. 27, pp. 281–290, 2011.

C. Friedman, T. C. Rindflesch, and M. Corn, “Natural language processing: State of the art and prospects for significant progress, a workshop sponsored by the National Library of Medicine,” Journal of Biomedical Informatics, vol. 46, no. 5, pp. 765–773, Oct. 2013, doi: 10.1016/j.jbi.2013.06.004.

K. S. Jones, “A statistical interpretation of term specificity and its application in retrieval,” p. 9, 1972.

K. Jones, “A statistical interpretation of term specificity and its application in retrieval,” Journal of Documentation, vol. 60, no. 5, pp. 493–502, Oct. 2004, doi: 10.1108/00220410410560573.

F. Sebastiani, “Machine learning in automated text categorization,” ACM Comput. Surv., vol. 34, no. 1, pp. 1–47, Mar. 2002, doi: 10.1145/505282.505283.

A. Thakkar and K. Chaudhari, “Predicting stock trend using an integrated term frequency–inverse document frequency-based feature weight matrix with neural networks,” Applied Soft Computing, vol. 96, p. 106684, Nov. 2020, doi: 10.1016/j.asoc.2020.106684.

G. Salton and C. S. Yang, “On the Specification of Term Values in Automatic Indexing,” 1973, doi: 10.1108/EB026562.

W. Zhang, T. Yoshida, and X. Tang, “A comparative study of TF*IDF, LSI and multi-words for text classification,” Expert Syst. Appl., vol. 38, pp. 2758–2765, Mar. 2011, doi: 10.1016/j.eswa.2010.08.066.

Y. Kim and Y. Kim, “Explainable heat-related mortality with random forest and SHapley Additive exPlanations (SHAP) models,” Sustainable Cities and Society, vol. 79, p. 103677, Apr. 2022, doi: 10.1016/j.scs.2022.103677.

C. Q. Bi RW and Q. C. Renwan BI, “Design method of secure computing protocol for deep neural network,” 网络与信息安全学报, vol. 6, no. 4, pp. 130–139, Aug. 2020, doi: 10.11959/j.issn.2096-109x.2020050.

H. Zhang, F. Wang, D. Sheng, P. Ban, and Y. Liu, “Precursors Identification for Forecasting UHF-Band Ionospheric Scintillation Events Over Chinese Low-Latitude Region by Deep Learning,” Earth and Space Science, vol. 9, no. 9, p. e2021EA002164, 2022, doi: 10.1029/2021EA002164.

Magento Commerce, “IMRG UK Click & Collect Review 2020 - Magento,” 2020. https://magento.com/sites/default/files8/2020-08/IMRG_Click _Collect_Report.pdf (accessed Dec. 08, 2021).

J. J. Cronin and S. A. Taylor, “Servperf versus Servqual: Reconciling Performance-Based and Perceptions-Minus-Expectations Measurement of Service Quality,” Journal of Marketing, vol. 58, no. 1, pp. 125–131, Jan. 1994, doi: 10.1177/002224299405800110.

M. Amorim and M. Rodrigues, “Building on eWOM to Understand Service Quality in Hotel Services,” 2019. doi: 10.5772/intechopen.85403.

A. Hassan Gorondutse and H. hilman Abdullah, “Mediation effect of customer satisfaction on the relationships between service quality and customer loyalty in the Nigerian foods and beverages industry: Sobel test approach,” International Journal of Management Science and Engineering Management, vol. 9, Jan. 2014, doi: 10.1080/17509653.2013.812337.

M. J. Sánchez-Franco, A. Navarro-García, and F. J. Rondán-Cataluña, “A naive Bayes strategy for classifying customer satisfaction: A study based on online reviews of hospitality services,” Journal of Business Research, vol. 101, no. C, pp. 499–506, 2019.

D. Novitasari, B. B. J. Napitupulu, S. Abadiyah, N. Silitonga, and M. Asbari, “Linking between Brand Leadership, Customer Satisfaction, and Repurchase Intention in the E-commerce Industry,” International Journal of Social and Management Studies, vol. 3, no. 1, Art. no. 1, Jan. 2022, doi: 10.5555/ijosmas.v3i1.109.

Z. Yang, X. Cao, F. Wang, and C. Lu, “Fortune or Prestige? The effects of content price on sales and customer satisfaction,” Journal of Business Research, vol. 146, pp. 426–435, Jul. 2022, doi: 10.1016/j.jbusres.2022.03.075.

L. T. Le, P. T. M. Ly, N. T. Nguyen, and L. T. T. Tran, “Online reviews as a pacifying decision-making assistant,” Journal of Retailing and Consumer Services, vol. 64, p. 102805, Jan. 2022, doi: 10.1016/j.jretconser.2021.102805.

F. Septianto, J. A. Kemper, and J. (Jane) Choi, “The power of beauty? The interactive effects of awe and online reviews on purchase intentions,” Journal of Retailing and Consumer Services, vol. 54, p. 102066, May 2020, doi: 10.1016/j.jretconser.2020.102066.

C. Wang, G. Chen, and Q. Wei, “A temporal consistency method for online review ranking,” Knowledge-Based Systems, vol. 143, pp. 259–270, Mar. 2018, doi: 10.1016/j.knosys.2017.09.036.

M.-J. Thomas, B. Wirtz, and J. Weyerer, “Determinants of Online Review Credibility and Its Impact on Consumers’ Purchase Intention,” Journal of Electronic Commerce Research, vol. 20, pp. 1–21, Feb. 2019.

A. Elmogy, U. Tariq, A. Mohammed, and A. Ibrahim, “Fake Reviews Detection using Supervised Machine Learning,” International Journal of Advanced Computer Science and Applications, vol. 12, Jan. 2021, doi: 10.14569/IJACSA.2021.0120169.

R. Barbado, Ó. Araque, and C. Iglesias, “A framework for fake review detection in online consumer electronics retailers,” Inf. Process. Manag., 2019, doi: 10.1016/j.ipm.2019.03.002.

Periodic monthly flowers positive reviews

Downloads

Published

16.01.2023

How to Cite

Mensouri, D. ., Azmani, A. ., & Azmani, M. . (2023). Combining Roberta Pre-Trained Language Model and NMF Topic Modeling Technique to Learn from Customer Reviews Analysis. International Journal of Intelligent Systems and Applications in Engineering, 11(1), 39–49. Retrieved from https://ijisae.org/index.php/IJISAE/article/view/2442

Issue

Section

Research Article