An Integrated DIET-BO Model for Intent Classification and Entity Extraction

Authors

  • Viet Minh Nhat Vo Hue University, Hue - 530000, Viet Nam
  • Van Son Ngo School of Hospitality and Tourism – Hue University, Hue - 530000, Viet Nam

Keywords:

Chatbot, DIET, natural language processing, machine learning, pre-training

Abstract

The DIET (Dual Intent and Entity Transformer) architecture is known as an effective method of intent classification and entity extraction for chatbot systems. However, a challenge is how to determine the best set of hyperparameters in terms of the number of iterations, the number of transformer layers, the transformer size, etc. to achieve the best DIET architecture. With huge possible combinations of hyperparameter values, there are an explosive number of DIET architectures to be considered. One solution to this problem is to integrate a statistical analysis technique such as Bayesian Optimization (BO) into the process of determining the best DIET architecture. The article proposes an integrated DIET-BO model, in which each DIET architecture is a candidate solution in the search space, the DIET training process is considered as an objective function and BO is used to find the best DIET architecture in the space of candidate solutions. A hotel chatbot conversational dataset is used to evaluate the effectiveness of the integrated DIET-BO model. The experimental results show that the integrated DIET-BO model achieves the intent classification F1-score of 0.869 and the entity extraction F1-score of 0.913.

Downloads

Download data is not yet available.

References

T. Bocklisch, J. Faulkner, N. Pawlowski, and A. Nichol, “Rasa: Open source language understanding and dialogue management,” arXiv Prepr. arXiv1712.05181, 2017.

A. Veltman, D. W. J. Pulle, and R. W. De Doncker, “The Transformer,” in Fundamentals of Electrical Drives, Cham: Springer International Publishing, 2016, pp. 47–82. doi: 10.1007/978-3-319-29409-4_3.

A. H. Mohammed and A. H. Ali, “Survey of BERT (Bidirectional Encoder Representation Transformer) types,” J. Phys. Conf. Ser., vol. 1963, no. 1, p. 12173, Jul. 2021, doi: 10.1088/1742-6596/1963/1/012173.

M. Zhang and J. Li, “A commentary of GPT-3 in MIT Technology Review 2021,” Fundam. Res., vol. 1, no. 6, pp. 831–833, 2021, doi: https://doi.org/10.1016/j.fmre.2021.11.011.

A. Conneau and G. Lample, “Cross-Lingual Language Model Pretraining,” in Proceedings of the 33rd International Conference on Neural Information Processing Systems, Red Hook, NY, USA: Curran Associates Inc., 2019.

J. Pennington, R. Socher, and C. Manning, “{G}lo{V}e: Global Vectors for Word Representation,” in Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing ({EMNLP}), Doha, Qatar: Association for Computational Linguistics, Oct. 2014, pp. 1532–1543. doi: 10.3115/v1/D14-1162.

Y. Liu et al., “Ro{{}BERT{}}a: A Robustly Optimized {{}BERT{}} Pretraining Approach.” 2020. [Online]. Available: https://openreview.net/forum?id=SyxS0T4tvS

M. Henderson, I. Casanueva, N. Mrkšić, P.-H. Su, T.-H. Wen, and I. Vulić, “{C}onve{RT}: Efficient and Accurate Conversational Representations from Transformers,” in Findings of the Association for Computational Linguistics: EMNLP 2020, Online: Association for Computational Linguistics, Nov. 2020, pp. 2161–2174. doi: 10.18653/v1/2020.findings-emnlp.196.

Z. Yang, Z. Dai, Y. Yang, J. Carbonell, R. Salakhutdinov, and Q. V Le, “XLNet: Generalized autoregressive pretraining for language understanding,” Adv. Neural Inf. Process. Syst., vol. 32, no. NeurIPS, pp. 1–11, 2019.

T. Bunk, D. Varshneya, V. Vlasov, and A. Nichol, “DIET: Lightweight Language Understanding for Dialogue Systems,” 2020, [Online]. Available: http://arxiv.org/abs/2004.09936

[11] M. H. M. Tarik, M. Omar, M. F. Abdullah, and R. Ibrahim, “Optimization of neural network hyperparameters for gas turbine modelling using bayesian optimization,” IET Conf. Publ., vol. 2018, no. CP749, 2018, doi: 10.1049/cp.2018.1308.

J. Bergstra and Y. Bengio, “Random Search for Hyper-Parameter Optimization,” J. Mach. Learn. Res., vol. 13, no. 10, pp. 281–305, 2012, [Online]. Available: http://jmlr.org/papers/v13/bergstra12a.html

C. White, W. Neiswanger, and Y. Savani, “BANANAS: Bayesian Optimization with Neural Architectures for Neural Architecture Search,” 35th AAAI Conf. Artif. Intell. AAAI 2021, vol. 12A, pp. 10293–10301, 2021, doi: 10.1609/aaai.v35i12.17233.

Y. Shen, M. Chen, S. Cai, S. Hu, and X. Chen, “BERT-RF Fusion Model Based on Bayesian optimization for Sentiment Classification,” 2022 IEEE 2nd Int. Conf. Comput. Commun. Artif. Intell. CCAI 2022, pp. 71–75, 2022, doi: 10.1109/CCAI55564.2022.9807729.

F. Dernoncourt and J. Y. Lee, “Optimizing neural network hyperparameters with Gaussian processes for dialog act classification,” in 2016 IEEE Spoken Language Technology Workshop (SLT), 2016, pp. 406–413. doi: 10.1109/SLT.2016.7846296.

W. Astuti, D. P. I. Putri, A. P. Wibawa, Y. Salim, Purnawansyah, and A. Ghosh, “Predicting Frequently Asked Questions (FAQs) on the COVID-19 Chatbot using the DIET Classifier,” 3rd 2021 East Indones. Conf. Comput. Inf. Technol. EIConCIT 2021, pp. 25–29, 2021, doi: 10.1109/EIConCIT50028.2021.9431913.

L. Fauzia, R. B. Hadiprakoso, and Girinoto, “Implementation of Chatbot on University Website Using RASA Framework,” in 2021 4th International Seminar on Research of Information Technology and Intelligent Systems (ISRITI), 2021, pp. 373–378. doi: 10.1109/ISRITI54043.2021.9702821.

M.-T. Nguyen, M. Tran-Tien, A. P. Viet, H.-T. Vu, and V.-H. Nguyen, “Building a Chatbot for Supporting the Admission of Universities,” in 2021 13th International Conference on Knowledge and Systems Engineering (KSE), 2021, pp. 1–6. doi: 10.1109/KSE53942.2021.9648677.

V. Panchal and L. Kurup, “Educational chatbot on Data Structures and Algorithms for the visually impaired,” in 2023 5th Biennial International Conference on Nascent Technologies in Engineering (ICNTE), 2023, pp. 1–6. doi: 10.1109/ICNTE56631.2023.10146702.

M. Arevalillo-Herraez, P. Arnau-Gonzalez, and N. Ramzan, “On Adapting the DIET Architecture and the Rasa Conversational Toolkit for the Sentiment Analysis Task,” IEEE Access, vol. 10, no. September, pp. 107477–107487, 2022, doi: 10.1109/ACCESS.2022.3213061.

Y. Kim and M. Chung, “An approach to hyperparameter optimization for the objective function in machine learning,” Electron., vol. 8, no. 11, 2019, doi: 10.3390/electronics8111267.

M. Zulfiqar, K. A. A. Gamage, M. Kamran, and M. B. Rasheed, “Hyperparameter Optimization of Bayesian Neural Network Using Bayesian Optimization and Intelligent Feature Engineering for Load Forecasting,” Sensors, vol. 22, no. 12, 2022, doi: 10.3390/s22124446.

S. Wang, J. Zhuang, J. Zheng, H. Fan, J. Kong, and J. Zhan, “Application of Bayesian Hyperparameter Optimized Random Forest and XGBoost Model for Landslide Susceptibility Mapping,” Front. Earth Sci., vol. 9, 2021, doi: 10.3389/feart.2021.712240.

J. Bergstra, R. Bardenet, Y. Bengio, and B. Kégl, “Algorithms for hyper-parameter optimization,” Adv. Neural Inf. Process. Syst. 24 25th Annu. Conf. Neural Inf. Process. Syst. 2011, NIPS 2011, pp. 1–9, 2011.

J. Wu, X. Y. Chen, H. Zhang, L. D. Xiong, H. Lei, and S. H. Deng, “Hyperparameter optimization for machine learning models based on Bayesian optimization,” J. Electron. Sci. Technol., vol. 17, no. 1, pp. 26–40, 2019, doi: 10.11989/JEST.1674-862X.80904120.

D. O. Oyewola, E. G. Dada, T. O. Omotehinwa, O. Emebo, and O. O. Oluwagbemi, “Application of Deep Learning Techniques and Bayesian Optimization with Tree Parzen Estimator in the Classification of Supply Chain Pricing Datasets of Health Medications,” Appl. Sci., vol. 12, no. 19, 2022, doi: 10.3390/app121910166.

[27] J. Snoek, H. Larochelle, and R. P. Adams, “Practical Bayesian Optimization of Machine Learning Algorithms,” in Advances in Neural Information Processing Systems, F. Pereira, C. J. Burges, L. Bottou, and K. Q. Weinberger, Eds., Curran Associates, Inc., 2012. [Online]. Available: https://proceedings.neurips.cc/paper_files/paper/2012/file/05311655a15b75fab869566

Ms. Elena Rosemaro. (2014). An Experimental Analysis Of Dependency On Automation And Management Skills. International Journal of New Practices in Management and Engineering, 3(01), 01 - 06. Retrieved from http://ijnpme.org/index.php/IJNPME/article/view/25

Stojanovic, N. . (2020). Deep Learning Technique-Based 3d Lung Image-Based Tumor Detection Using segmentation and Classification. Research Journal of Computer Systems and Engineering, 1(2), 13:19. Retrieved from https://technicaljournals.org/RJCSE/index.php/journal/article/view/6

Sharma, R., Dhabliya, D. A review of automatic irrigation system through IoT (2019) International Journal of Control and Automation, 12 (6 Special Issue), pp. 24-29.

Downloads

Published

21.09.2023

How to Cite

Vo, V. M. N. ., & Ngo, V. S. . (2023). An Integrated DIET-BO Model for Intent Classification and Entity Extraction. International Journal of Intelligent Systems and Applications in Engineering, 11(4), 666–673. Retrieved from https://ijisae.org/index.php/IJISAE/article/view/3602

Issue

Section

Research Article

Similar Articles

You may also start an advanced similarity search for this article.