Parameter-Efficient Fine-Tuning of LLMS for Low-Resource Deployment in Real-World Applications

Folasade Ayankoya

Authors

Folasade Ayankoya, Shade Kuyoro, Olawunmi Adebanjo, Oluwayemisi Fatade, Olubukola Adekola

Keywords:

Adapters, BitFit, edge AI, large language models, LoRA, model quantization, low-resource deployment, parameter-efficient fine-tuning

Abstract

Large language models (LLMs) is currently achieving significant results across natural language processing fields, but their high computational and memory requirements brought major barriers to deployment in low-resource environments. This paper investigates parameter-efficient fine-tuning (PEFT) techniques, such as LoRA, Adapters, Prefix Tuning, and BitFit, as feasible alternatives to full model fine-tuning. We benchmark these methods on classification, question answering, and summarization tasks using LLaMA-7B and Flan-T5, evaluating both performance and resource efficiency. Our results show that PEFT methods drastically reduce the number of trainable parameters and memory usage while maintaining competitive accuracy. We further validate these findings through real-world case studies: a Raspberry Pi-based offline education assistant and a mobile health triage app. These experiments demonstrate that PEFT methods, when combined with quantization, enable effective and sustainable deployment of LLMs in constrained environments.

Downloads

Download data is not yet available.

References

] Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D., Dhariwal, P., ... & Amodei, D. (2020). Language models are few-shot learners. In Advances in Neural Information Processing Systems (Vol. 33, pp. 1877–1901).

] Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., ... & Liu, P. J. (2020). Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of Machine Learning Research, 21(140), 1–67.

Gholami, A., Yao, Z., Mahoney, M. W., & Keutzer, K. (2021). A survey of quantization methods for efficient neural network inference. arXiv preprint arXiv:2103.13630.

Zhao, W., Wang, K., Li, Z., Xu, Y., Yan, Y., He, X., & Ma, J. (2021). Pre-trained language models: Past, present and future. AI Open, 2, 91–105.

Hu, E. J., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang, L., & Chen, W. (2022). LoRA: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685

Houlsby, N., Giurgiu, A., Jastrzębski, S., Morrone, B., De Laroussilhe, Q., Gesmundo, A., ... & Gelly, S. (2019). Parameter-efficient transfer learning for NLP. In International Conference on Machine Learning (pp. 2790–2799). PMLR.

Zaken, E. B., Goldberg, Y., & Ravfogel, S. (2021). BitFit: Simple parameter-efficient fine-tuning for transformer-based masked language-models. arXiv preprint arXiv:2106.10199.

Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of NAACL-HLT (pp. 4171–4186).

Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., & Sutskever, I. (2019). Language models are unsupervised multitask learners. OpenAI Blog, 1(8), 9.

Li, X. L., & Liang, P. (2021). Prefix-tuning: Optimizing continuous prompts for generation. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics (pp. 4582–4597).

Lester, B., Al-Rfou, R., & Constant, N. (2021). The power of scale for parameter-efficient prompt tuning. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (pp. 3045–3059).

Liu, P., Yuan, W., Fu, J., Jiang, Z., Hayashi, H., & Neubig, G. (2022). Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys, 55(9), 1–35.

Shen, S., Tang, J., Tan, Z., Huang, D., Zhang, Y., & Cui, P. (2021). Towards efficient AI: A survey on advances of federated learning in edge computing. ACM Computing Surveys, 54(8), 1–36.

Han, S., Mao, H., & Dally, W. J. (2016). Deep compression: Compressing deep neural networks with pruning, trained quantization and Huffman coding. In International Conference on Learning Representations.

Jacob, B., Kligys, S., Chen, B., Zhu, M., Tang, M., Howard, A., ... & Kalenichenko, D. (2018). Quantization and training of neural networks for efficient integer-arithmetic-only inference. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 2704–2713).

Frankle, J., & Carbin, M. (2019). The lottery ticket hypothesis: Finding sparse, trainable neural networks. In International Conference on Learning Representations

Dettmers, T., Pagnoni, A., Holtzman, A., & Zettlemoyer, L. (2023). QLoRA: Efficient fine-tuning of quantized LLMs. arXiv preprint arXiv:2305.14314.

Parameter-Efficient Fine-Tuning of LLMS for Low-Resource Deployment in Real-World Applications

Authors

Keywords:

Abstract

Downloads

References

Downloads

Published

How to Cite

Issue

Section

License

Most read articles by the same author(s)

Similar Articles

Announcements

Information for Authors

ijisae

Information

Indexed By

Parameter-Efficient Fine-Tuning of LLMS for Low-Resource Deployment in Real-World Applications

Authors

Keywords:

Abstract

Downloads

References

Downloads

Published

How to Cite

Issue

Section

License

Most read articles by the same author(s)

Similar Articles

Announcements

Information for Authors

Like, Subscribe and Share This Video

ijisae

Information

Indexed By