Novel Opinion mining System for Movie Reviews

Keywords: Ensemble Learning, Opinion Mining, Sentiment Analysis, Text Classification.

Abstract

Abstract: In this paper, an efficient opinion mining system has been presented. Opinion Mining (OM) works on transferring the online available opinions into useful knowledge. The proposed system utilizes Word2Vec, which is one of the states of the art text feature extraction method, along with ensemble learning algorithm for classification. The challenging and benchmark “IMDB Movies Reviews” dataset have been used for conducting the experimental comparison and verification. In addition, the performance of the proposed method is compared to some of the well-known machine learning algorithms like Support Vector Machine (SVM), K-Nearest Neighbor (KNN), and Naive Bayes (NB). The tested ensemble methods are the Random Forest (RF), AdaBoost Classifier, and Gradient-Boosting Classifier (GBC).  The results of the conducted experiments using the challenging and benchmark “IMDB Movies Reviews” dataset have shown that the performance of SVM, KNN, and NB are comparable. However, the performance, robustness and stability of the system has been significantly improved by adapting the ensemble learning along with the Word2Vec, and an efficient preprocessing the data.

Downloads

Download data is not yet available.

References

Social Media Examiner, "2018 Social media marketing industry report", Social media examiner, 2019 [Online]. Available: http://www.socialmediaexaminer.com/ report2016/.[Accessed: 20.1.2019]

Liu, B., & Zhang, L. (2012). A survey of opinion mining and sentiment analysis. In mining text data. Springer, Boston, MA, 415-463.

Mäntylä, M. V., Graziotin, D., & Kuutila, M. (2018). The evolution of sentiment analysis—A review of research topics, venues, and top cited papers.Computer Science Review, 27, 16-32.

Pradhan, V. M., Vala, J., & Balani, P. (2016). A survey on Sentiment Analysis Algorithms for opinion mining. International Journal of Computer Applications, 133(9), 7-11.

Bing Liu. (May 2012). Sentiment Analysis and Opinion Mining, Morgan & Claypool Publishers.

Riaz, S., Fatima, M., Kamran, M., & Nisar, M. W., “Opinion mining on large scale data using sentiment analysis and k-means clustering”, Cluster Computing, 22(3), pp.7149-7164, 2019.

Aishwarya, R., et al, "A Novel Adaptable Approach for Sentiment Analysis", International Journal of Scientific Research in Computer Science, Engineering and Information Technology, 5 (2), 2019.

Neviarouskaya, A., Prendinger, H., and Ishizuka, M. (2015). Attitude Sensing in Text Based on A Compositional Linguistic Approach. Computational Intelligence, 31(2), 256–300.

Esuli, A., & Sebastiani, F. (2006, May). Sentiwordnet: A publicly available lexical resource for opinion mining. In LREC, 6, 417-422.

Baccianella, S., Esuli, A., & Sebastiani, F. (2010, May). Sentiwordnet 3.0: an enhanced lexical resource for sentiment analysis and opinion mining. In Lrec 10 (2010), 2200-2204.

Korovkinas, K., Danėnas, P., & Garšva, G. (2017). SVM and Naïve Bayes Classification Ensemble Method for Sentiment Analysis. Baltic Journal of Modern Computing, 5(4), 398-409.

Eroğul, U. (2009). Sentiment analysis in Turkish. Master’s thesis. Middle East Technical University, Ankara.

Dehkharghani, R., Yanikoglu, B., Saygin, Y., & Oflazer, K. (2017). Sentiment analysis in Turkish at different granularity levels. Natural Language Engineering, 23(4), 535-559.

Vural, A. G., Cambazoglu, B. B., Senkul, P., & Tokgoz, Z. O. (2013). A framework for sentiment analysis in turkish: Application to polarity detection of movie reviews in turkish. In Computer and Information Sciences III (pp. 437-445). Springer, London.

Thelwall, M., Buckley, K., Paltoglou, G., Cai, D., & Kappas, A. (2011). Sentiment strength detection in short informal text. Journal of the American Society for Information Science and Technology, 62(2), 419.

Türkmenoglu, C., & Tantug, A. C. (2014, June). Sentiment analysis in Turkish media. In Proceedings of Workshop on Issues of Sentiment Discovery and Opinion Mining, International Conference on Machine Learning (ICML), Beijing, China.

Catal, C. and Nangir, M., 2017. A sentiment classification model based on multiple classifiers. Applied Soft Computing, 50, pp.135-141.

Shehu, H. A., & Tokat, S. (2019, April). A hybrid approach for the sentiment analysis of Turkish Twitter data. In The International Conference on Artificial Intelligence and Applied Mathematics in Engineering (pp. 182-190). Springer, Cham.

Dehkharghani, R., Saygin, Y., Yanikoglu, B., & Oflazer, K. (2016). SentiTurkNet: a Turkish polarity lexicon for sentiment analysis. Language Resources and Evaluation, 50(3), 667-685.

Demirtas, E., & Pechenizkiy, M. (2013, August). Cross-lingual polarity detection with machine translation. In Proceedings of the Second International Workshop on Issues of Sentiment Discovery and Opinion Mining (p. 9). ACM.

Ucan, A., Naderalvojoud, B., Sezer, E. A., & Sever, H. (2016, January). SentiWordNet for new language: automatic translation approach. In 2016 12th International Conference on Signal-Image Technology & Internet-Based Systems (SITIS) (pp. 308-315). IEEE.

CUNNINGHAM, Padraig; DELANY, Sarah Jane. k-Nearest neighbour classifiers. Multiple Classifier Systems, 2007, 34.8: 1-17.

NIKHATH, A. Kousar; SUBRAHMANYAM, K.; VASAVI, R. Building a K-Nearest Neighbor Classifier for Text Categorization. International Journal of Computer Science and Information Technologies, 2016, 7.1: 254-256.

FRANK, Eibe; BOUCKAERT, Remco R. Naive bayes for text classification with unbalanced classes. In: European Conference on Principles of Data Mining and Knowledge Discovery. Springer, Berlin, Heidelberg, 2006. p. 503-510.

DIETTERICH, Thomas G. Ensemble methods in machine learning. In: International workshop on multiple classifier systems. Springer, Berlin, Heidelberg, 2000. p. 1-15.

DADGAR, Seyyed Mohammad Hossein; ARAGHI, Mohammad Shirzad; FARAHANI, Morteza Mastery. A novel text mining approach based on TF-IDF and Support Vector Machine for news classification. In: 2016 IEEE International Conference on Engineering and Technology (ICETECH). IEEE, 2016. p. 112-116.

ONAN, Aytuğ; KORUKOĞLU, Serdar; BULUT, Hasan. Ensemble of keyword extraction methods and classifiers in text classification. Expert Systems with Applications, 2016, 57: 232-247.

Breiman, L. (1996). Bagging predictors. Machine learning, 24(2), 123-140.

RODRIGUEZ, Juan José; KUNCHEVA, Ludmila I.; ALONSO, Carlos J. Rotation forest: A new classifier ensemble method. IEEE transactions on pattern analysis and machine intelligence, 2006, 28.10: 1619-1630.

FRIEDMAN, Jerome H. Stochastic gradient boosting. Computational statistics & data analysis, 2002, 38.4: 367-378.

C Hans, M Agus, and D Suhartono,”Single Document Automatic Text Summarization using Term Frequency-Inverse Document Frequency (TF-IDF),” ComTech: Computer, Mathematics and Engineering Applications, vol. 7, no. 4, pp. 285-294, 2016.

Mikolov, T., Chen, K., Corrado, G. S., Dean, J., Sutskever, L., & Zweig, G. (2013). word2vec. URL https://code.google.com/p/word2vec.

Mikolov, T., Grave, E., Bojanowski, P., Puhrsch, C., & Joulin, A. (2017). Advances in pre-training distributed word representations. arXiv preprint arXiv:1712.09405.

P Jeffrey, R Socher, and C Manning,”Glove: Global vectors for word representation,” In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp. 1532-1543. 2014.

J Armand, E Grave, P Bojanowski, M Douze, H Jégou, and T Mikolov,”Fasttext. Zip: Compressing text classification models,” arXiv preprint arXiv: 1612.03651, 2016.

Rodriguez-Galiano, V. F., Chica-Olmo, M., Abarca-Hernandez, F., Atkinson, P. M., & Jeganathan, C. (2012). Random Forest classification of Mediterranean land cover using multi-seasonal imagery and multi-seasonal texture. Remote Sensing of Environment, 121, 93-107.

Published
2020-06-26
How to Cite
[1]
A. H. AbdulHafiz, “Novel Opinion mining System for Movie Reviews”, IJISAE, vol. 8, no. 2, pp. 94-101, Jun. 2020.
Section
Research Article