Zero-Shot Invoice Information Extraction Using Foundation Models with Spatial Prompt Tuning

Authors

  • Ranadheer Reddy Charabuddi

Keywords:

Document Understanding Transformer, Foundation Models, Invoice, Prompt Tuning, Zero-Shot.

Abstract

Extracting structured information from scanned invoices poses significant challenges due to diverse layouts, linguistic variability, and the scarcity of annotated training data. To address this, the study introduces a zero-shot invoice information extraction framework that leverages the Donut foundation model, integrated with spatial prompt tuning. Unlike conventional OCR-based pipelines, the proposed approach operates directly on document images without the need for explicit text recognition or task-specific fine-tuning. The method was evaluated using the SROIE v2 dataset, comprising 973 annotated invoice images, and was implemented using the Python framework. Spatially-aware natural language prompts were used to guide the model’s attention toward relevant regions such as headers or totals. Experimental evaluation demonstrated a notable performance gain, with the model achieving 98.0% accuracy, surpassing baseline methods like BiLSTM-CRF and LayoutLM by over 4%. The results validate the model’s effectiveness and scalability for real-world document automation, especially in zero-shot settings with high template variability.

DOI: https://doi.org/10.17762/ijisae.v13i1s.7722

Downloads

Download data is not yet available.

References

W. Lehmacher, “Digitizing and automating processes in logistics,” Disrupting Logistics: Startups, Technologies, and Investors Building Future Supply Chains, pp. 9–27, 2021.

T. Saout, F. Lardeux, and F. Saubion, “An overview of data extraction from invoices,” IEEE Access, vol. 12, pp. 19872–19886, 2024.

D. Baviskar, S. Ahirrao, V. Potdar, and K. Kotecha, “Efficient automated processing of the unstructured documents using artificial intelligence: A systematic literature review and future directions,” Ieee Access, vol. 9, pp. 72894–72936, 2021.

A. Sassioui, R. Benouini, Y. El Ouargui, M. El Kamili, M. Chergui, and M. Ouzzif, “Visually-rich document understanding: concepts, taxonomy and challenges,” in 2023 10th International Conference on Wireless Networks and Mobile Communications (WINCOM), IEEE, 2023, pp. 1–7.

Z. Chen et al., “Evolution and Prospects of Foundation Models: From Large Language Models to Large Multimodal Models.,” Computers, Materials & Continua, vol. 80, no. 2, 2024.

K.-A. L. Nguyen, “Document Understanding with Deep Learning Techniques,” PhD Thesis, Sorbonne Université, 2024.

M. Ylisiurunen, “Extracting semi-structured information from receipts,” 2022.

M. Li et al., “Trocr: Transformer-based optical character recognition with pre-trained models,” in Proceedings of the AAAI conference on artificial intelligence, 2023, pp. 13094–13102.

M. Namysł, “Robust Information Extraction From Unstructured Documents,” PhD Thesis, Universitäts-und Landesbibliothek Bonn, 2023.

Y. Song, T. Wang, P. Cai, S. K. Mondal, and J. P. Sahoo, “A comprehensive survey of few-shot learning: Evolution, applications, challenges, and opportunities,” ACM Computing Surveys, vol. 55, no. 13s, pp. 1–40, 2023.

Hanning Zhang, “A financial ticket image intelligent recognition system based on deep learning,” Knowledge-Based Systems, vol. 222, p. 106955, Jun. 2021, doi: 10.1016/j.knosys.2021.106955.

M. Limam, M. Dhiaf, and Y. Kessentini, “Fatura: A multi-layout invoice image dataset for document analysis and understanding,” arXiv preprint arXiv:2311.11856, 2023.

K. Yindumathi, S. S. Chaudhari, and R. Aparna, “Structured data extraction using machine learning from image of unstructured bills/invoices,” in Smart Computing Techniques and Applications: Proceedings of the Fourth International Conference on Smart Computing and Informatics, Volume 2, Springer, 2021, pp. 129–140

Q. Yang, Y. Hu, R. Cao, H. Li, and P. Luo, “Zero-shot key information extraction from mixed-style tables: pre-training on wikipedia,” in 2021 IEEE International Conference on Data Mining (ICDM), IEEE, 2021, pp. 1451–1456.

L. Lam, P. Ratnamogan, J. Tang, W. Vanhuffel, and F. Caspani, “Information extraction from documents: Question answering vs token classification in real-world setups,” in International Conference on Document Analysis and Recognition, Springer, 2023, pp. 205–220.

Urban Knuples, “SROIE datasetv2.” [Online]. Available: https://www.kaggle.com/datasets/urbikn/sroie-datasetv2

Downloads

Published

25.02.2025

How to Cite

Ranadheer Reddy Charabuddi. (2025). Zero-Shot Invoice Information Extraction Using Foundation Models with Spatial Prompt Tuning. International Journal of Intelligent Systems and Applications in Engineering, 13(1s), 283 –. Retrieved from https://ijisae.org/index.php/IJISAE/article/view/7722

Issue

Section

Research Article