Recognition of Historical Kannada Manuscripts using Convolution Neural Network
Keywords:
Historical Kannada documents, Image recognition, Document analysis, Cultural heritage preservation Convolutional Neural NetworkAbstract
Document image analysis has emerged as a field of study with growing importance over the last few decades. Historical documents add another challenge of physical degradation that needs to be tackled in the pre-processing. The main focus of the present work is the classification and identification of Kannada old stone inscription characters. The characters are segmented into lines, words, and characters for easier processing. The segmented images are then preprocessed in order to extract the essential features and remove the redundancies in the image. The preprocessed data is then augmented in order to compensate for the lack of datasets, and the existing dataset is trained in order to create data for the training phase. The machine learning model, Convolutional Neural Network (CNN), is selected. The classifiers based on each model are trained, and the performance of each model is evaluated. The model developed for recognizing Kannada characters achieved a validation accuracy of 95.9%. This outcome demonstrates a significant achievement in processing and digitizing ancient Kannada scripts, considering the complex nature of the language and the diverse characteristics of individual handwriting.
Downloads
References
Rajithkumar B K et al, “Template matching method for recognition of stone inscripted Kannada characters of different time frames based on correlation analysis”, International Journal of Electrical and Computer Engineering (IJECE) Vol. 4, No. 5, October 2014, pp. 719~729,ISSN: 2088-8708.
Sridevi T.N, et al. “Deep Convolution Neural Network for Degraded Printed Kannada Character Recognitions”. Indian Journal of Computer Science and Engineering. Volume 12 No. 3 May-Jun 2021. DOI: 10.21817/indices/2021/v12i3/211203187
Rajithkumar B K et al, “Extraction of Stone In-scripted Kannada Characters Using Sift Algorithm Based Image Mosaic”, International Journal of Electronics & Communication Technology, Volume 5, Issue 2, April - June 2014.
Rajithkumar B K et al., “Era Identification and Recognition of Stone In-scripted Kannada Characters Using Artificial Neural Networks”:2nd National Conference on Innovation in Computing and Communication Technology, March, 2014.
Haoming Zhang. “Ancient Stone Inscription Image Denoising and Inpainting Methods Based on Deep Neural Networks”. Discrete Dynamics in Nature and Society Vol. 1, 2021. DOI: 10.1155/2021/7675611
Chandrakala, H. T, “Deep Convolution Neural Networks for Recognition of Historical Handwritten Kannada Characters”, In Frontiers in Intelligent Computing: Theory and Applications (pp. 69-77). Springer, Singapore. 2021
Thippeswamy, G. “Recognition of Historical Handwritten Kannada Characters Using Local Binary Pattern Features”. International Journal of Natural Computing Research (IJNCR), 2020
F. Lombardi, “Deep learning for historical document analysis and recognition—a survey,” National Library of Medicine, vol. 10, 2020.
M. R. Gupta, N. P. Jacobson, and E. K. Garcia, “OCR binarization and image pre-processing for searching historical documents,” Pattern Recognition, vol. 40, 2007.
J. Martine, L. Lenc, and P. Kr al, “Building an efficient OCR system for historical documents with little training data,” Neural Computing and Applications, vol. 32, 2020.
R. Manmatha and N. Srimal, “Scale space technique for word segmentation in handwritten documents,” in Scale-Space Theories in Computer Vision, Springer Berlin Heidelberg, 1999.
G. Chen, Q. Chen, X. Zhu, and Y. Chen, “A study of historical documents denoising,” in 2017 10th International Congress on Image and Signal Processing, Biomedical Engineering and Informatics (CISP-BMEI), 2017.
P. Sharan, S. Aitha, A. Kumar, A. Trivedi, A. Augustine, and S. R. K. Sarvadevabhatla, “Palmira: A deep deformable network for instance segmentation of dense and uneven layouts in handwritten manuscripts,” CoRR, 2021.
R. I. Minyue Dai Carrie Yang and M. J. Brown., “Experiments with early modern manuscripts and computer-aided transcription,” Pattern Recognition Letters, 2018.
Kshetry and R. Lal, “Image pre-processing and modified adaptive thresholding for improving OCR,” ArXiv, 2021.
M. Shen and H. Lei, “Improving OCR performance with background image elimination,” in 2015 12th International Conference on Fuzzy Systems and Knowledge Discovery (FSKD), 2015
T. Blanke, M. Bryant, and M. Hedges, “Open source optical character recognition for historical research,” Journal of Documentation, vol. 68, 2012.
B. J. Bipin Nair, N. Shobharani, N. R. Sreekumar, and G. Ashok, “A two phase denoising approach to remove uneven illumination from ancient note book images,” in 2021 7th International Conference on Advanced Computing and Communication Systems (ICACCS), vol. 1, 2021.
K. Saddami, K. Munadi, Y. Away, and F. Arnia, “Effective and fast binarization method for combined degradation on ancient documents,” Heliyon, vol. 5, 2019.
C. Tensmeyer and T. Martinez, “Historical document image binarization: A review,” SN Computer Science, vol. 1, 2020.
J. A. S anchez, V. Romero, A. H. Toselli, M. Villegas, and E. Vidal, “A set of benchmarks for handwritten text recognition on historical documents,” Pattern Recognition, vol. 94, 2019.
M. Almeida, R. Lins, R. Bernardino, D. Jesus, and B. Lima, “A new binarization algorithm for historical documents,” Journal of Imaging, vol. 4, 2018.
W. Xiong and L. Zhou, “An enhanced binarization framework for degraded historical document images,” EURASIP Journal on Image and Video Processing, vol. 13, 2021
S. Lu and C. L. Tan, “Script and language identification in noisy and degraded document images,” IEEE transactions on pattern analysis and machine intelligence, vol. 30, 2008.
S. Vijayarani and A. Sakila, “Multi-language script identification from document images,” International Research Journal of Modernization in Engineering Technology and Science, vol. 3, 2021
Kumar, H. S et al., “ Versatile OCR for Documents in any Language Printed in Kannada Script”. ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP),2020.
Monisha, G. S. et al,: “Effective Survey on Handwriting Character Recognition”. In Computational Method and Data Engineering. Springer, Singapore.2021
Sandhya, N., & Krishnan, R. “Broken Kannada character recognition a neural network based approach”, International Conference on Electrical, Electronics, and Optimization Techniques (ICEEOT).2016, March: (pp. 2047-2050). IEEE.
Sandhya, N., Krishnan, R., Babu, D. R., & Rao, N. B.,. “An efficient approach for handling degradation in character recognition.”, International Journal of Advanced Intelligence Paradigms,”.2019, 14(1-2), 14-29.
Aradhya, V. M., Kumar, G. H., Noushath, S., &Shivakumara, P. “Fisher linear discriminant analysis based technique useful for efficient character recognition”, Fourth International Conference on Intelligent Sensing and Information Processing,2006, (pp. 49-52). IEEE
Sandhya, N., Krishnan, R., &Babu, D. R. “A novel local enhancement technique for rebuilding Broken characters in a degraded Kannada script”. In 2015 IEEE International Advance Computing Conference (IACC),2015 June (pp. 176-179). IEEE
Downloads
Published
How to Cite
Issue
Section
License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
All papers should be submitted electronically. All submitted manuscripts must be original work that is not under submission at another journal or under consideration for publication in another form, such as a monograph or chapter of a book. Authors of submitted papers are obligated not to submit their paper for publication elsewhere until an editorial decision is rendered on their submission. Further, authors of accepted papers are prohibited from publishing the results in other publications that appear before the paper is published in the Journal unless they receive approval for doing so from the Editor-In-Chief.
IJISAE open access articles are licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. This license lets the audience to give appropriate credit, provide a link to the license, and indicate if changes were made and if they remix, transform, or build upon the material, they must distribute contributions under the same license as the original.