Developing A Cloud-Based Natural Language Processing (NLP) Platform for Sentiment Analysis and Opinion Mining of Social Media Data

Authors

  • Ugandhar Dasi, Nikhil Singla, Rajkumar Balasubramanian, Siddhant Benadikar, Rishabh Rajesh Shanbhag

Keywords:

natural language processing; sentiment analysis; opinion mining; cloud computing; microservices; Kubernetes; deep learning; social media analytics

Abstract

With the rapid growth of user-generated content on social media platforms, there is an increasing need for efficient and scalable natural language processing (NLP) tools to analyze and derive insights from this vast amount of textual data. Sentiment analysis and opinion mining are two crucial NLP tasks that enable businesses, organizations, and researchers to understand public opinion, monitor brand reputation, and make data-driven decisions. This paper presents the development of a cloud-based NLP platform that leverages state-of-the-art deep learning models and big data technologies to perform large-scale sentiment analysis and opinion mining on social media data. The proposed platform utilizes a microservices architecture deployed on a Kubernetes cluster, enabling high scalability, fault-tolerance, and easy integration with other systems. We evaluate the performance of the platform on multiple benchmark datasets and real-world social media data, demonstrating its effectiveness in accurately classifying sentiment polarity and extracting key opinion targets and aspects. The platform achieves an average F1-score of 0.87 for sentiment classification and 0.81 for aspect-based opinion mining. We also conduct a case study to showcase the platform's ability to monitor and analyze public opinion on a specific topic over time. The results highlight the potential of the proposed cloud-based NLP platform in facilitating data-driven decision making and providing valuable insights from social media data.

Downloads

Download data is not yet available.

References

Cambria, E. (2016). Affective computing and sentiment analysis. IEEE Intelligent Systems, 31(2), 102-107.

Liu, B. (2015). Sentiment analysis: Mining opinions, sentiments, and emotions. New York: Cambridge University Press.

Pang, B., & Lee, L. (2008). Opinion mining and sentiment analysis. Foundations and Trends in Information Retrieval, 2(1–2), 1-135.

Medhat, W., Hassan, A., & Korashy, H. (2014). Sentiment analysis algorithms and applications: A survey. Ain Shams Engineering Journal, 5(4), 1093-1113.

Ravi, K., & Ravi, V. (2015). A survey on opinion mining and sentiment analysis: Tasks, approaches and applications. Knowledge-Based Systems, 89, 14-46.

Maynard, D., Roberts, I., Greenwood, M. A., Rout, D., & Bontcheva, K. (2017). A framework for real-time semantic social media analysis. Journal of Web Semantics, 44, 75-88.

Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018). BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.

Yang, Z., Dai, Z., Yang, Y., Carbonell, J., Salakhutdinov, R., & Le, Q. V. (2019). XLNet: Generalized autoregressive pretraining for language understanding. arXiv preprint arXiv:1906.08237.

Cambria, E., Poria, S., Hazarika, D., & Kwok, K. (2018). SenticNet 5: Discovering conceptual primitives for sentiment analysis by means of context embeddings. In Proceedings of AAAI (pp. 1795-1802).

Taboada, M., Brooke, J., Tofiloski, M., Voll, K., & Stede, M. (2011). Lexicon-based methods for sentiment analysis. Computational Linguistics, 37(2), 267-307.

Kiritchenko, S., Zhu, X., & Mohammad, S. M. (2014). Sentiment analysis of short informal texts. Journal of Artificial Intelligence Research, 50, 723-762.

Pang, B., Lee, L., & Vaithyanathan, S. (2002). Thumbs up?: sentiment classification using machine learning techniques. In Proceedings of EMNLP (pp. 79-86).

Mohammad, S. M., Kiritchenko, S., & Zhu, X. (2013). NRC-Canada: Building the state-of-the-art in sentiment analysis of tweets. arXiv preprint arXiv:1308.6242.

Zhang, L., Wang, S., & Liu, B. (2018). Deep learning for sentiment analysis: A survey. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 8(4), e1253.

Young, T., Hazarika, D., Poria, S., & Cambria, E. (2018). Recent trends in deep learning based natural language processing. IEEE Computational Intelligence Magazine, 13(3), 55-75.

Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013). Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems (pp. 3111-3119).

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I. (2017). Attention is all you need. In Advances in Neural Information Processing Systems (pp. 5998-6008).

Shvachko, K., Kuang, H., Radia, S., & Chansler, R. (2010). The hadoop distributed file system. In 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST) (pp. 1-10). IEEE.

Zaharia, M., Xin, R. S., Wendell, P., Das, T., Armbrust, M., Dave, A., ... & Ghodsi, A. (2016). Apache spark: a unified engine for big data processing. Communications of the ACM, 59(11), 56-65.

Carbone, P., Katsifodimos, A., Ewen, S., Markl, V., Haridi, S., & Tzoumas, K. (2015). Apache flink: Stream and batch processing in a single engine. Bulletin of the IEEE Computer Society Technical Committee on Data Engineering, 36(4).

Varghese, B., & Buyya, R. (2018). Next generation cloud computing: New trends and research directions. Future Generation Computer Systems, 79, 849-861.

Qian, L., Luo, Z., Du, Y., & Guo, L. (2009). Cloud computing: An overview. In IEEE International Conference on Cloud Computing (pp. 626-631).

Gannon, D., Barga, R., & Sundaresan, N. (2017). Cloud-native applications. IEEE Cloud Computing, 4(5), 16-21.

Agerri, R., Artola, X., Beloki, Z., Rigau, G., & Soroa, A. (2015). Big data for natural language processing: A streaming approach. Knowledge-Based Systems, 79, 36-42.

Khalifa, M. B., Karoui, K., & Vora, S. (2016). Amazon elastic mapreduce and spark for NLP on big data. In Proceedings of the 2nd International Workshop on Cloud Computing for Natural Language Processing (pp. 1-8). Association for Computational Linguistics.

Wang, F. (2019). Sentiment Analysis and Cloud Computing for Natural Language Processing: Challenges and Future Directions. In 2019 IEEE International Conference on Big Data (Big Data) (pp. 4865-4869). IEEE.

Richardson, C. (2018). Microservices patterns: with examples in Java. Manning Publications Co..

Balalaie, A., Heydarnoori, A., & Jamshidi, P. (2016). Microservices architecture enables devops: Migration to a cloud-native architecture. IEEE Software, 33(3), 42-52.

Burns, B., Grant, B., Oppenheimer, D., Brewer, E., & Wilkes, J. (2016). Borg, omega, and kubernetes. Queue, 14(1), 70-93.

Verma, A., Pedrosa, L., Korupolu, M.,Oppenheimer, D., Tune, E., & Wilkes, J. (2015). Large-scale cluster management at Google with Borg. In Proceedings of the Tenth European Conference on Computer Systems (pp. 1-17).

Bhojwani, H., & Gupta, M. (2018). Orchestrating container solutions using kubernetes. In 2018 3rd International Conference on Computational Systems and Information Technology for Sustainable Solutions (CSITSS) (pp. 188-193). IEEE.

Selvaraj, S. P., & Hoang, L. V. (2019). Microservices Pattern in Kubernetes Environment. In 2019 IEEE International Conference on Cloud Computing in Emerging Markets (CCEM) (pp. 40-45). IEEE.

Wang, C., & Akella, R. (2017). Building a cloud-based drug discovery platform using microservices architecture. In 2017 IEEE International Conference on Cloud Computing Technology and Science (CloudCom) (pp. 81-88). IEEE.

Kuriakose, J., & Verma, R. (2017). Transforming Healthcare through Natural Language Processing and Machine Learning on Big Legal Data. In 2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA) (pp. 898-903). IEEE.

Li, X., Liu, Q., Wang, X., & Xu, J. (2017). Research on real-time sentiment analysis of dangerous goods transport vehicles based on Spark. In 2017 IEEE 2nd International Conference on Big Data Analysis (ICBDA) (pp. 421-425). IEEE.

Kreps, J., Narkhede, N., & Rao, J. (2011). Kafka: A distributed messaging system for log processing. In Proceedings of the NetDB (pp. 1-7).

Strimzi: Apache Kafka on Kubernetes. (2021). Retrieved from https://strimzi.io/

Nakatani, S. (2010). Language Detection Library for Java. Retrieved from https://github.com/shuyo/language-detection

Bird, S., Klein, E., & Loper, E. (2009). Natural language processing with Python: analyzing text with the natural language toolkit. O'Reilly Media, Inc..

Honnibal, M., & Johnson, M. (2015). An Improved Non-monotonic Transition System for Dependency Parsing. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (pp. 1373-1378).

Loria, S. (2018). TextBlob Documentation. Release 0.15, 2.

Olston, C., Fiedel, N., Gorovoy, K., Harmsen, J., Lao, L., Li, F., ... & Soyke, J. (2017). Tensorflow-serving: Flexible, high-performance ml serving. arXiv preprint arXiv:1712.06139.

Wang, Y., Huang, M., Zhu, X., & Zhao, L. (2016). Attention-based LSTM for aspect-level sentiment classification. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing (pp. 606-615).

Ma, D., Li, S., Zhang, X., & Wang, H. (2017). Interactive attention networks for aspect-level sentiment classification. arXiv preprint arXiv:1709.00893.

Pontiki, M., Galanis, D., Pavlopoulos, J., Papageorgiou, H., Androutsopoulos, I., & Manandhar, S. (2014). Semeval-2014 task 4: Aspect based sentiment analysis. In Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014) (pp. 27-35).

Dong, L., Wei, F., Tan, C., Tang, D., Zhou, M., & Xu, K. (2014). Adaptive recursive neural network for target-dependent twitter sentiment classification. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) (pp. 49-54).

Lakshman, A., & Malik, P. (2010). Cassandra: a decentralized structured storage system. ACM SIGOPS Operating Systems Review, 44(2), 35-40.

CassKop: Cassandra operator for Kubernetes. (2021). Retrieved from https://github.com/Orange-OpenSource/casskop

Plotly. (2021). Dash documentation & user guide. Retrieved from https://dash.plotly.com/

Socher, R., Perelygin, A., Wu, J., Chuang, J., Manning, C. D., Ng, A., & Potts, C. (2013). Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing (pp. 1631-1642).

CrowdFlower. (2015). Twitter US Airline Sentiment. Retrieved from https://www.figure-eight.com/data-for-everyone/

Wolf, T., Debut, L., Sanh, V., Chaumond, J., Delangue, C., Moi, A., ... & Rush, A. M. (2019). HuggingFace's Transformers: State-of-the-art Natural Language Processing. arXiv preprint arXiv:1910.03771.

Prometheus: Monitoring system & time series database. (2021). Retrieved from https://prometheus.io/

Grafana: The open observability platform. (2021). Retrieved from https://grafna.com.

Downloads

Published

09.07.2024

How to Cite

Ugandhar Dasi. (2024). Developing A Cloud-Based Natural Language Processing (NLP) Platform for Sentiment Analysis and Opinion Mining of Social Media Data. International Journal of Intelligent Systems and Applications in Engineering, 12(22s), 165–174. Retrieved from https://ijisae.org/index.php/IJISAE/article/view/6406

Issue

Section

Research Article