An Integrated DIET-BO Model for Intent Classification and Entity Extraction
Keywords:
Chatbot, DIET, natural language processing, machine learning, pre-trainingAbstract
The DIET (Dual Intent and Entity Transformer) architecture is known as an effective method of intent classification and entity extraction for chatbot systems. However, a challenge is how to determine the best set of hyperparameters in terms of the number of iterations, the number of transformer layers, the transformer size, etc. to achieve the best DIET architecture. With huge possible combinations of hyperparameter values, there are an explosive number of DIET architectures to be considered. One solution to this problem is to integrate a statistical analysis technique such as Bayesian Optimization (BO) into the process of determining the best DIET architecture. The article proposes an integrated DIET-BO model, in which each DIET architecture is a candidate solution in the search space, the DIET training process is considered as an objective function and BO is used to find the best DIET architecture in the space of candidate solutions. A hotel chatbot conversational dataset is used to evaluate the effectiveness of the integrated DIET-BO model. The experimental results show that the integrated DIET-BO model achieves the intent classification F1-score of 0.869 and the entity extraction F1-score of 0.913.
Downloads
References
T. Bocklisch, J. Faulkner, N. Pawlowski, and A. Nichol, “Rasa: Open source language understanding and dialogue management,” arXiv Prepr. arXiv1712.05181, 2017.
A. Veltman, D. W. J. Pulle, and R. W. De Doncker, “The Transformer,” in Fundamentals of Electrical Drives, Cham: Springer International Publishing, 2016, pp. 47–82. doi: 10.1007/978-3-319-29409-4_3.
A. H. Mohammed and A. H. Ali, “Survey of BERT (Bidirectional Encoder Representation Transformer) types,” J. Phys. Conf. Ser., vol. 1963, no. 1, p. 12173, Jul. 2021, doi: 10.1088/1742-6596/1963/1/012173.
M. Zhang and J. Li, “A commentary of GPT-3 in MIT Technology Review 2021,” Fundam. Res., vol. 1, no. 6, pp. 831–833, 2021, doi: https://doi.org/10.1016/j.fmre.2021.11.011.
A. Conneau and G. Lample, “Cross-Lingual Language Model Pretraining,” in Proceedings of the 33rd International Conference on Neural Information Processing Systems, Red Hook, NY, USA: Curran Associates Inc., 2019.
J. Pennington, R. Socher, and C. Manning, “{G}lo{V}e: Global Vectors for Word Representation,” in Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing ({EMNLP}), Doha, Qatar: Association for Computational Linguistics, Oct. 2014, pp. 1532–1543. doi: 10.3115/v1/D14-1162.
Y. Liu et al., “Ro{{}BERT{}}a: A Robustly Optimized {{}BERT{}} Pretraining Approach.” 2020. [Online]. Available: https://openreview.net/forum?id=SyxS0T4tvS
M. Henderson, I. Casanueva, N. Mrkšić, P.-H. Su, T.-H. Wen, and I. Vulić, “{C}onve{RT}: Efficient and Accurate Conversational Representations from Transformers,” in Findings of the Association for Computational Linguistics: EMNLP 2020, Online: Association for Computational Linguistics, Nov. 2020, pp. 2161–2174. doi: 10.18653/v1/2020.findings-emnlp.196.
Z. Yang, Z. Dai, Y. Yang, J. Carbonell, R. Salakhutdinov, and Q. V Le, “XLNet: Generalized autoregressive pretraining for language understanding,” Adv. Neural Inf. Process. Syst., vol. 32, no. NeurIPS, pp. 1–11, 2019.
T. Bunk, D. Varshneya, V. Vlasov, and A. Nichol, “DIET: Lightweight Language Understanding for Dialogue Systems,” 2020, [Online]. Available: http://arxiv.org/abs/2004.09936
[11] M. H. M. Tarik, M. Omar, M. F. Abdullah, and R. Ibrahim, “Optimization of neural network hyperparameters for gas turbine modelling using bayesian optimization,” IET Conf. Publ., vol. 2018, no. CP749, 2018, doi: 10.1049/cp.2018.1308.
J. Bergstra and Y. Bengio, “Random Search for Hyper-Parameter Optimization,” J. Mach. Learn. Res., vol. 13, no. 10, pp. 281–305, 2012, [Online]. Available: http://jmlr.org/papers/v13/bergstra12a.html
C. White, W. Neiswanger, and Y. Savani, “BANANAS: Bayesian Optimization with Neural Architectures for Neural Architecture Search,” 35th AAAI Conf. Artif. Intell. AAAI 2021, vol. 12A, pp. 10293–10301, 2021, doi: 10.1609/aaai.v35i12.17233.
Y. Shen, M. Chen, S. Cai, S. Hu, and X. Chen, “BERT-RF Fusion Model Based on Bayesian optimization for Sentiment Classification,” 2022 IEEE 2nd Int. Conf. Comput. Commun. Artif. Intell. CCAI 2022, pp. 71–75, 2022, doi: 10.1109/CCAI55564.2022.9807729.
F. Dernoncourt and J. Y. Lee, “Optimizing neural network hyperparameters with Gaussian processes for dialog act classification,” in 2016 IEEE Spoken Language Technology Workshop (SLT), 2016, pp. 406–413. doi: 10.1109/SLT.2016.7846296.
W. Astuti, D. P. I. Putri, A. P. Wibawa, Y. Salim, Purnawansyah, and A. Ghosh, “Predicting Frequently Asked Questions (FAQs) on the COVID-19 Chatbot using the DIET Classifier,” 3rd 2021 East Indones. Conf. Comput. Inf. Technol. EIConCIT 2021, pp. 25–29, 2021, doi: 10.1109/EIConCIT50028.2021.9431913.
L. Fauzia, R. B. Hadiprakoso, and Girinoto, “Implementation of Chatbot on University Website Using RASA Framework,” in 2021 4th International Seminar on Research of Information Technology and Intelligent Systems (ISRITI), 2021, pp. 373–378. doi: 10.1109/ISRITI54043.2021.9702821.
M.-T. Nguyen, M. Tran-Tien, A. P. Viet, H.-T. Vu, and V.-H. Nguyen, “Building a Chatbot for Supporting the Admission of Universities,” in 2021 13th International Conference on Knowledge and Systems Engineering (KSE), 2021, pp. 1–6. doi: 10.1109/KSE53942.2021.9648677.
V. Panchal and L. Kurup, “Educational chatbot on Data Structures and Algorithms for the visually impaired,” in 2023 5th Biennial International Conference on Nascent Technologies in Engineering (ICNTE), 2023, pp. 1–6. doi: 10.1109/ICNTE56631.2023.10146702.
M. Arevalillo-Herraez, P. Arnau-Gonzalez, and N. Ramzan, “On Adapting the DIET Architecture and the Rasa Conversational Toolkit for the Sentiment Analysis Task,” IEEE Access, vol. 10, no. September, pp. 107477–107487, 2022, doi: 10.1109/ACCESS.2022.3213061.
Y. Kim and M. Chung, “An approach to hyperparameter optimization for the objective function in machine learning,” Electron., vol. 8, no. 11, 2019, doi: 10.3390/electronics8111267.
M. Zulfiqar, K. A. A. Gamage, M. Kamran, and M. B. Rasheed, “Hyperparameter Optimization of Bayesian Neural Network Using Bayesian Optimization and Intelligent Feature Engineering for Load Forecasting,” Sensors, vol. 22, no. 12, 2022, doi: 10.3390/s22124446.
S. Wang, J. Zhuang, J. Zheng, H. Fan, J. Kong, and J. Zhan, “Application of Bayesian Hyperparameter Optimized Random Forest and XGBoost Model for Landslide Susceptibility Mapping,” Front. Earth Sci., vol. 9, 2021, doi: 10.3389/feart.2021.712240.
J. Bergstra, R. Bardenet, Y. Bengio, and B. Kégl, “Algorithms for hyper-parameter optimization,” Adv. Neural Inf. Process. Syst. 24 25th Annu. Conf. Neural Inf. Process. Syst. 2011, NIPS 2011, pp. 1–9, 2011.
J. Wu, X. Y. Chen, H. Zhang, L. D. Xiong, H. Lei, and S. H. Deng, “Hyperparameter optimization for machine learning models based on Bayesian optimization,” J. Electron. Sci. Technol., vol. 17, no. 1, pp. 26–40, 2019, doi: 10.11989/JEST.1674-862X.80904120.
D. O. Oyewola, E. G. Dada, T. O. Omotehinwa, O. Emebo, and O. O. Oluwagbemi, “Application of Deep Learning Techniques and Bayesian Optimization with Tree Parzen Estimator in the Classification of Supply Chain Pricing Datasets of Health Medications,” Appl. Sci., vol. 12, no. 19, 2022, doi: 10.3390/app121910166.
[27] J. Snoek, H. Larochelle, and R. P. Adams, “Practical Bayesian Optimization of Machine Learning Algorithms,” in Advances in Neural Information Processing Systems, F. Pereira, C. J. Burges, L. Bottou, and K. Q. Weinberger, Eds., Curran Associates, Inc., 2012. [Online]. Available: https://proceedings.neurips.cc/paper_files/paper/2012/file/05311655a15b75fab869566
Ms. Elena Rosemaro. (2014). An Experimental Analysis Of Dependency On Automation And Management Skills. International Journal of New Practices in Management and Engineering, 3(01), 01 - 06. Retrieved from http://ijnpme.org/index.php/IJNPME/article/view/25
Stojanovic, N. . (2020). Deep Learning Technique-Based 3d Lung Image-Based Tumor Detection Using segmentation and Classification. Research Journal of Computer Systems and Engineering, 1(2), 13:19. Retrieved from https://technicaljournals.org/RJCSE/index.php/journal/article/view/6
Sharma, R., Dhabliya, D. A review of automatic irrigation system through IoT (2019) International Journal of Control and Automation, 12 (6 Special Issue), pp. 24-29.
Downloads
Published
How to Cite
Issue
Section
License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
All papers should be submitted electronically. All submitted manuscripts must be original work that is not under submission at another journal or under consideration for publication in another form, such as a monograph or chapter of a book. Authors of submitted papers are obligated not to submit their paper for publication elsewhere until an editorial decision is rendered on their submission. Further, authors of accepted papers are prohibited from publishing the results in other publications that appear before the paper is published in the Journal unless they receive approval for doing so from the Editor-In-Chief.
IJISAE open access articles are licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. This license lets the audience to give appropriate credit, provide a link to the license, and indicate if changes were made and if they remix, transform, or build upon the material, they must distribute contributions under the same license as the original.