A Three-Order Ensemble Model for User-level Big Five Personality Prediction on Twitter Dataset

Henry  Lucky; Ghinaa  Zain Nabiilah; Nicholaus  Hendrik Jeremy; Derwin  Suhartono

Authors

Henry Lucky Computer Science Department, School of Computer Science, Bina Nusantara University, Jakarta – 11480, INDONESIA https://orcid.org/0000-0002-4233-0409
Ghinaa Zain Nabiilah Computer Science Department, School of Computer Science, Bina Nusantara University, Jakarta – 11480, INDONESIA https://orcid.org/0000-0001-7638-7449
Nicholaus Hendrik Jeremy Computer Science Department, School of Computer Science, Bina Nusantara University, Jakarta – 11480, INDONESIA https://orcid.org/0000-0003-3242-365X
Derwin Suhartono Computer Science Department, School of Computer Science, Bina Nusantara University, Jakarta – 11480, INDONESIA https://orcid.org/0000-0002-3271-5874

Keywords:

Big Five, Personality prediction, IndoBERT, IndoBERTweet, Indonesian Twitter, Ensemble model

Abstract

The rapid development of social media has changed the way of interacting and communicating, one of which is using Twitter. Through Twitter, users can express themselves and their feelings directly without limits. It can unconsciously become a medium that reflects one’s personality. In conducting personality assessments, the Natural Language Processing (NLP) model can use to predict personality automatically. So, in this study, an experiment was conducted to predict user personality based on the Big Five Personality Traits, especially in Indonesia. Previous research on personality prediction using BERT has provided promising results. However, BERT has drawbacks because it is limited in processing many words. To process information better it requires prediction of personality at the user-level by using all the user's information. Based on this, this research focuses on conducting experiments by proposing the Three Order Ensemble method with the BERT workflow (TOEM-BERT) as a scheme for combining tweets so that tweet data can be used optimally. The testing phase consists of two different experimental scenarios using two types of BERT models: IndoBERT and IndoBERTweet. Parallel test scenarios are carried out using the test set for each model, and linear test scenarios are carried out using the same test set for the entire model. The experiments show that the proposed TOEM-BERT method performs better in all test scenarios by obtaining 78.41% Weighted F1 in the linear test using IndoBERT and 77.84% Weighted F1 in the parallel test using IndoBERTweet.

Downloads

Download data is not yet available.

References

S. Kemp, “Twitter Statistics and Trends,” datareportal.com, 2022.

S. Kemp, “Digital 2022: Indonesia,” datareportal.com, 2022.

R. Buettner, “Predicting user behavior in electronic markets based on personality-mining in large online social networks: A personality-based product recommender framework,” Electronic Markets, vol. 27, no. 3, pp. 247–265, Aug. 2017, doi: 10.1007/s12525-016-0228-z.

M. D. Back et al., “Facebook profiles reflect actual personality, not self-idealization,” Psychol Sci, vol. 21, no. 3, pp. 372–374, 2010, doi: 10.1177/0956797609360756.

H. A. Schwartz and L. H. Ungar, “Data-Driven Content Analysis of Social Media: A Systematic Overview of Automated Methods,” Annals of the American Academy of Political and Social Science, vol. 659, no. 1, pp. 78–94, May 2015, doi: 10.1177/0002716215569197.

R. M. Bergner, “What is personality? Two myths and a definition,” New Ideas in Psychology, vol. 57. Elsevier Ltd, Apr. 01, 2020. doi: 10.1016/j.newideapsych.2019.100759.

S. Bharadwaj, S. Sridhar, R. Choudhary, and R. Srinath, “Persona Traits Identification based on Myers-Briggs Type Indicator(MBTI) - A Text Classification Approach,” International Conference on Advances in Computing, Communications and Informatics (ICACCI), 2018.

P. Kumar and M. L. Gavrilova, “Personality Traits Classification on Twitter.”

X. Wang, Y. Sui, K. Zheng, Y. Shi, and S. Cao, “Personality classification of social users based on feature fusion,” Sensors, vol. 21, no. 20, Oct. 2021, doi: 10.3390/s21206758.

J. Devlin, M. W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of deep bidirectional transformers for language understanding,” in NAACL HLT 2019 - 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies - Proceedings of the Conference, 2019, vol. 1, pp. 4171–4186.

V. G. dos Santos and I. Paraboni, “Myers-Briggs personality classification from social media text using pre-trained language models,” Jul. 2022, doi: 10.3897/jucs.70941.

Z. Ren, Q. Shen, X. Diao, and H. Xu, “A sentiment-aware deep learning approach for personality detection from text,” Inf Process Manag, vol. 58, no. 3, May 2021, doi: 10.1016/j.ipm.2021.102532.

F. Koto, J. H. Lau, and T. Baldwin, “IndoBERTweet: A Pretrained Language Model for Indonesian Twitter with Effective Domain-Specific Vocabulary Initialization,” in Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021, pp. 10660–10668.

Kelvin, I. S. Edbert, and D. Suhartono, “UTILIZING INDOBERT IN PREDICTING PERSONALITY FROM TWITTER POSTS USING BAHASA INDONESIA,” ICIC Express Letters, vol. 17, no. 1, pp. 123–130, Jan. 2023, doi: 10.24507/icicel.17.01.123.

H. Lucky, Roslynlia, and D. Suhartono, “Towards Classification of Personality Prediction Model: A Combination of BERT Word Embedding and MLSMOTE,” Proceedings of 2021 1st International Conference on Computer Science and Artificial Intelligence, ICCSAI 2021, pp. 346–350, 2021, doi: 10.1109/ICCSAI53272.2021.9609750.

N. H. Jeremy, C. Prasetyo, and D. Suhartono, “Identifying personality traits for Indonesian user from twitter dataset,” International Journal of Fuzzy Logic and Intelligent Systems, vol. 19, no. 4, pp. 283–289, 2019, doi: 10.5391/IJFIS.2019.19.4.283.

H. Christian, D. Suhartono, A. Chowanda, and K. Z. Zamli, “Text based personality prediction from multiple social media data sources using pre-trained language model and model averaging,” J Big Data, vol. 8, no. 1, pp. 1–20, 2021, doi: 10.1186/s40537-021-00459-1.

G. Y. N. N. Adi, M. H. Tandio, V. Ong, and D. Suhartono, “Optimization for Automatic Personality Recognition on Twitter in Bahasa Indonesia,” Procedia Comput Sci, vol. 135, pp. 473–480, 2018, doi: 10.1016/j.procs.2018.08.199.

V. Ong, A. D. S. Rahmanto, W. Williem, N. H. Jeremy, D. Suhartono, and E. W. Andangsari, “Personality Modelling of Indonesian Twitter Users with XGBoost Based on the Five Factor Model,” International Journal of Intelligent Engineering and Systems, vol. 14, no. 2, pp. 248–261, 2021, doi: 10.22266/ijies2021.0430.22.

A. Kazameini, S. Fatehi, Y. Mehta, S. Eetemadi, and E. Cambria, “Personality Trait Detection Using Bagged SVM over BERT Word Embedding Ensembles,” pp. 1–4, 2020, [Online]. Available: http://arxiv.org/abs/2010.01309

M. Lewis et al., “BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension,” in Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020, pp. 7871–7880.

R. R. McCrae and P. T. Costa Jr, “The five-factor theory of personality.,” 2008.

M. H. Amirhosseini and H. Kazemian, “Machine learning approach to personality type prediction based on the myers–briggs type indicator®,” Multimodal Technologies and Interaction, vol. 4, no. 1, p. 9, 2020.

K. A. Nisha et al., “A Comparative Analysis of Machine Learning Approaches in Personality Prediction Using MBTI,” in Computational Intelligence in Pattern Recognition, Springer, 2022, pp. 13–23.

C. Sumner, A. Byers, R. Boochever, and G. J. Park, “Predicting dark triad personality traits from twitter usage and a linguistic analysis of tweets,” in 2012 11th international conference on machine learning and applications, 2012, vol. 2, pp. 386–393.

Z. M. M. Aung and P. H. Myint, “Personality Prediction Based on Content of Facebook Users:A Literature Review,” in 2019 20th IEEE/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD), 2019, pp. 34–38. doi: 10.1109/SNPD.2019.8935692.

D. Markovikj, S. Gievska, M. Kosinski, and D. Stillwell, “Mining facebook data for predictive personality modeling,” in Proceedings of the International AAAI Conference on Web and Social Media, 2013, vol. 7, no. 2, pp. 23–26.

N. de Ven, A. Bogaert, A. Serlie, M. J. Brandt, and J. J. A. Denissen, “Personality perception based on LinkedIn profiles,” Journal of Managerial Psychology, 2017.

F. Piedboeuf, P. Langlais, and L. Bourg, “Personality extraction through LinkedIn,” in Canadian Conference on Artificial Intelligence, 2019, pp. 55–67.

Y. Mehta, S. Fatehi, A. Kazameini, C. Stachl, E. Cambria, and S. Eetemadi, “Bottom-up and top-down: Predicting personality with psycholinguistic and language model features,” in 2020 IEEE International Conference on Data Mining (ICDM), 2020, pp. 1184–1189.

K. El-Demerdash, R. A. El-Khoribi, M. A. I. Shoman, and S. Abdou, “Deep learning based fusion strategies for personality prediction,” Egyptian Informatics Journal, vol. 23, no. 1, pp. 47–53, 2022.

M. Gjurković and J. Šnajder, “Reddit: A gold mine for personality prediction,” in Proceedings of the second workshop on computational modeling of people’s opinions, personality, and emotions in social media, 2018, pp. 87–97.

E. J. Choong and K. D. Varathan, “Predicting judging-perceiving of Myers-Briggs Type Indicator (MBTI) in online social forum,” PeerJ, vol. 9, p. e11382, 2021.

J. Shen, O. Brdiczka, and J. Liu, “Understanding email writers: Personality prediction from email messages,” in International conference on user modeling, adaptation, and personalization, 2013, pp. 318–330.

B. Ferwerda and M. Tkalcic, “Predicting users’ personality from instagram pictures: Using visual and/or content features?,” in Proceedings of the 26th conference on user modeling, adaptation and personalization, 2018, pp. 157–161.

E. Harris and A. C. Bardey, “Do Instagram profiles accurately portray personality? An investigation into idealized online self-presentation,” Front Psychol, vol. 10, p. 871, 2019.

A. C. E. S. Lima and L. N. de Castro, “A multi-label, semi-supervised classification approach applied to personality prediction in social media,” Neural Networks, vol. 58, pp. 122–130, 2014, doi: https://doi.org/10.1016/j.neunet.2014.05.020.

J.-M. Dewaele, “Personality: Personality traits as independent and dependent variables,” in Psychology for language learning, Springer, 2012, pp. 42–57.

G. Farnadi et al., “Computational personality recognition in social media,” User Model User-adapt Interact, vol. 26, no. 2, pp. 109–142, 2016.

O. Sagi and L. Rokach, “Ensemble learning: A survey,” Wiley Interdiscip Rev Data Min Knowl Discov, vol. 8, no. 4, p. e1249, 2018.

M. A. Ganaie, M. Hu, and others, “Ensemble deep learning: A review,” arXiv preprint arXiv:2104.02395, 2021.

B. Verhoeven, W. Daelemans, and T. de Smedt, “Ensemble methods for personality recognition,” in Proceedings of the International AAAI Conference on Web and Social Media, 2013, vol. 7, no. 2, pp. 35–38.

O. Kampman, E. J. Barezi, D. Bertero, and P. Fung, “Investigating audio, visual, and text fusion methods for end-to-end automatic personality prediction,” arXiv preprint arXiv:1805.00705, 2018.

Z. R. Samani, S. C. Guntuku, M. E. Moghaddam, D. Preoc{t}iuc-Pietro, and L. H. Ungar, “Cross-platform and cross-interaction study of user personality based on images on Twitter and Flickr,” PLoS One, vol. 13, no. 7, p. e0198660, 2018.

B. Wilie et al., “IndoNLU: Benchmark and Resources for Evaluating Indonesian Natural Language Understanding,” Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing, pp. 843–857, 2020.

Y. Zhang, D. Miao, Z. Zhang, J. Xu, and S. Luo, “A three-way selective ensemble model for multi-label classification,” International Journal of Approximate Reasoning, vol. 103, pp. 394–413, 2018, doi: 10.1016/j.ijar.2018.10.009.

W. Farlessyost, K. R. Grant, S. R. Davis, D. Feil-Seifer, and E. M. Hand, “The effectiveness of multi-label classification and multi-output regression in social trait recognition,” Sensors, vol. 21, no. 12, pp. 1–15, 2021, doi: 10.3390/s21124127.

P. Kamtar, D. Jitkongchuen, and E. Pacharawongsakda, “Multi-label classification of employee job performance prediction by disc personality,” ACM International Conference Proceeding Series, pp. 47–52, 2019, doi: 10.1145/3366650.3366666.

J. Risch and R. Krestel, “Bagging BERT models for robust aggression identification,” in Proceedings of the Second Workshop on Trolling, Aggression and Cyberbullying, 2020, pp. 55–61.

B. Wilie et al., “IndoNLU: Benchmark and Resources for Evaluating Indonesian Natural Language Understanding,” Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing, pp. 843–857, 2020, [Online]. Available: https://www.aclweb.org/anthology/2020.aacl-main.85

S. Paul, “Bayesian Hyperparameter Optimization - A Primer on Weights & Biases,” www.wandb.ai, 2020. https://wandb.ai/site/articles/bayesian-hyperparameter-optimization-a-primer (accessed Jan. 02, 2023).

I. Loshchilov and F. Hutter, “Decoupled Weight Decay Regularization,” in International Conference on Learning Representations, 2018.

A Three-Order Ensemble Model for User-level Big Five Personality Prediction on Twitter Dataset

Authors

Keywords:

Abstract

Downloads

References

Downloads

Published

How to Cite

Issue

Section

License

Most read articles by the same author(s)

Similar Articles

Announcements

Information for Authors

ijisae

Information

trindex