Vertical Text Detection and Recognition in Natural Scene Images: A Vertical Text Classifier and Detector with Gated Dual Adaptive Attention Mechanism

A. S. Venkata  Praneel; T. Srinivasa  Rao

Authors

A. S. Venkata Praneel Department of Computer Science and Engineering, GITAM (Deemed-to-be University), Visakhapatnam-530045, AP, India
T. Srinivasa Rao Department of Computer Science and Engineering, GITAM (Deemed-to-be University), Visakhapatnam-530045, AP, India

Keywords:

Vertical Text Classifier and Detector Module, IoU with Inclination Algorithm, Text Detection, Text recognition, GDAAM, Semantic reasoning, Vertical Text, Character awareness

Abstract

This paper proposes a novel approach to improving vertical text detection and recognition in natural scene images by integrating the Vertical Text Classifier and Detector Module (VTCD), which incorporates the IoU with Inclination (IoUI) Algorithm into the Gated Dual Adaptive Attention Mechanism (GDAAM). GDAAM is a unique framework for successful text recognition in demanding settings. The suggested Vertical Text Classifier and Detector Module integration intends to increase the Gated Dual Adaptive Attention Mechanism accuracy and resilience in dealing with vertical text in complicated visual situations. The Gated Dual Adaptive Attention Mechanism encoder accurately localizes text areas in natural scene images. The Vertical Text Classifier and Detector Module are used after localization to fine-tune the bounding boxes and improve vertical text detection. The Vertical Text Classifier and Detector Module's enhanced data is smoothly integrated into the Gated Dual Adaptive Attention Mechanism decoder, impacting fine-grained attention modelling. The model constantly adjusts its attention weights depending on the revised bounding boxes, enabling exact text identification by selectively focusing on key visual and textual signals. In addition to tackling the issues given by irregular text forms and different orientations. The reasoning module uses VTCD’s revised bounding boxes to gather contextual information, while character awareness is improved to handle complicated text layouts and occlusions. The visual-semantic ensemble fusion decoder integrates input from both modalities to provide coherent and contextually consistent text recognition results. Extensive trials on benchmark datasets such as ICDAR 2013, ICDAR 2015, and the VTIG-500 show that the proposed Gated Dual Adaptive Attention Mechanism with Vertical Text Classifier and Detector Module works well. The results indicate higher performance in terms of accuracy and resilience compared to cutting-edge techniques, notably in difficult text recognition tasks. The addition of the Vertical Text Classifier and Detector Module to the Gated Dual Adaptive Attention Mechanism broadens in natural scene images on text recognition, displaying promising results when dealing with vertical text in complex visual conditions.

Downloads

Download data is not yet available.

References

Ahmed, Abdullah Khalid. "Signage recognition based wayfinding system for the visually impaired." (2015).

M. Liao, B. Shi, and X. Bai, "Textboxes++: A single-shot oriented scene text detector," IEEE Transactions on Image Processing, vol. 27, no. 8, pp. 3676-3690, 2018.

Mishra, Anand. "Understanding Text in Scene Images." International Institute of Information Technology Hyderabad (2016).

Lin, Han, Peng Yang, and Fanlong Zhang. "Review of scene text detection and recognition." Archives of computational methods in engineering 27, no. 2 (2020): 433-454.

Ye, Qixiang, and David Doermann. "Text detection and recognition in imagery: A survey." IEEE transactions on pattern analysis and machine intelligence 37, no. 7 (2014): 1480-1500.

Yuliang, Liu, Jin Lianwen, Zhang Shuaitao, and Zhang Sheng. "Detecting curve text in the wild: New dataset and new solution." arXiv preprint arXiv:1712.02170 (2017).

C. Yao, X. Bai, and W. Liu, "A unified framework for multi-oriented text detection and recognition," IEEE Transactions on Image Processing, vol. 23, no. 11, pp. 4737-4749, 2014.

Chen, Yilin, and Juan Yang. "Research on scene text recognition algorithm basedon improved CRNN." In Proceedings of the 2020 4th International Conference on Digital Signal Processing, pp. 107-111. 2020.

Ling, Ong Yi, Lau Bee Theng, Almon Chai, and Chris McCarthy. "A model for automatic recognition of vertical texts in natural scene images." In 2018 8th IEEE International Conference on Control System, Computing and Engineering (ICCSCE), pp. 170-175. IEEE, 2018.

Tang, Jun, Zhibo Yang, Yongpan Wang, Qi Zheng, Yongchao Xu, and Xiang Bai. "Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping." Pattern recognition 96 (2019): 106954.

Ong, Yi Ling, Bee Theng Lau, Almon Chai, and Chris McCarthy. "A deep learning framework for recognizing vertical texts in natural scene." In 2019 International Conference on Computer and Drone Applications (IConDA), pp. 48-53. IEEE, 2019.

Y. Zhu, C. Yao, and X. Bai, "Scene text detection and recognition: Recent advances and future trends," Frontiers of Computer Science, vol. 10, no. 1, pp. 19-36, 2016.

Huang, Weilin, Zhe Lin, Jianchao Yang, and Jue Wang. "Text localization in natural images using stroke feature transform and text covariance descriptors." In Proceedings of the IEEE international conference on computer vision, pp. 1241-1248. 2013.

He, Tong, Weilin Huang, Yu Qiao, and Jian Yao. "Text-attentional convolutional neural network for scene text detection." IEEE transactions on image processing 25, no. 6 (2016): 2529-2541.

Neumann, Lukáš, and Jiří Matas. "Real-time scene text localization and recognition." In 2012 IEEE conference on computer vision and pattern recognition, pp. 3538-3545. IEEE, 2012.

Jaderberg, Max, Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. "Reading text in the wild with convolutional neural networks." International journal of computer vision 116 (2016): 1-20.

Huang, Weilin, Yu Qiao, and Xiaoou Tang. "Robust scene text detection with convolution neural network induced mser trees." In Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part IV 13, pp. 497-511. Springer International Publishing, 2014..

Yao, Cong, Xiang Bai, Wenyu Liu, Yi Ma, and Zhuowen Tu. "Detecting texts of arbitrary orientations in natural images." In 2012 IEEE conference on computer vision and pattern recognition, pp. 1083-1090. IEEE, 2012.

Li, Yao, Wenjing Jia, Chunhua Shen, and Anton van den Hengel. "Characterness: An indicator of text in the wild." IEEE transactions on image processing 23, no. 4 (2014): 1666-1677..

Girshick, Ross, Jeff Donahue, Trevor Darrell, and Jitendra Malik. "Rich feature hierarchies for accurate object detection and semantic segmentation." In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 580-587. 2014.

Ren, Shaoqing, Kaiming He, Ross Girshick, and Jian Sun. "Faster r-cnn: Towards real-time object detection with region proposal networks." Advances in neural information processing systems 28 (2015).

Liu, Wei, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C. Berg. "Ssd: Single shot multibox detector." In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pp. 21-37. Springer International Publishing, 2016.

Gupta, Ankush, Andrea Vedaldi, and Andrew Zisserman. "Synthetic data for text localisation in natural images." In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2315-2324. 2016.

Yi, Chucai, and Yingli Tian. "Localizing text in scene images by boundary clustering, stroke segmentation, and string fragment classification." IEEE transactions on image processing 21, no. 9 (2012): 4256-4268.

Tian, Zhi, Weilin Huang, Tong He, Pan He, and Yu Qiao. "Detecting text in natural image with connectionist text proposal network." In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part VIII 14, pp. 56-72. Springer International Publishing, 2016.

Zhou, Xinyu, Cong Yao, He Wen, Yuzhi Wang, Shuchang Zhou, Weiran He, and Jiajun Liang. "East: an efficient and accurate scene text detector." In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp. 5551-5560. 2017.

Shi, Baoguang, Xiang Bai, and Cong Yao. "An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition." IEEE transactions on pattern analysis and machine intelligence 39, no. 11 (2016): 2298-2304.

Choi, Chankyu, Youngmin Yoon, Junsu Lee, and Junseok Kim. "Simultaneous recognition of horizontal and vertical text in natural images." In Computer Vision–ACCV 2018 Workshops: 14th Asian Conference on Computer Vision, Perth, Australia, December 2–6, 2018, Revised Selected Papers 14, pp. 202-212. Springer International Publishing, 2019. [29] Capture2Text. "Capture2Text." http://capture2text.sourceforge.net/ (accessed 15 August, 2018)

Dehghani, Mostafa, Stephan Gouws, Oriol Vinyals, Jakob Uszkoreit, and Łukasz Kaiser. "Universal transformers." arXiv preprint arXiv:1807.03819 (2018).

Li, Hui, Peng Wang, Chunhua Shen, and Guyu Zhang. "Show, attend and read: A simple and strong baseline for irregular text recognition." In Proceedings of the AAAI conference on artificial intelligence, vol. 33, no. 01, pp. 8610-8617. 2019.

Cheng, Zhanzhan, Fan Bai, Yunlu Xu, Gang Zheng, Shiliang Pu, and Shuigeng Zhou. "Focusing attention: Towards accurate text recognition in natural images." In Proceedings of the IEEE international conference on computer vision, pp. 5076-5084. 2017.

A.S.Venkata Praneel, et al. 2023. “Text Detection Using Transformation Scaling Extension Algorithm in Natural Scene Images”. International Journal on Recent and Innovation Trends in Computing and Communication 11 (10):1233-44. https://doi.org/10.17762/ijritcc.v11i10.8664.

Ling, Ong Yi, Lau Bee Theng, Almon Chai Weiyen, and Christopher Mccarthy. "Development of vertical text interpreter for natural scene images." IEEE Access 9 (2021): 144341-144351.

Ma, Jianqi, Weiyuan Shao, Hao Ye, Li Wang, Hong Wang, Yingbin Zheng, and Xiangyang Xue. "Arbitrary-oriented scene text detection via rotation proposals." IEEE transactions on multimedia 20, no. 11 (2018): 3111-3122.

Ling Ong, Yi, Bee Theng Lau, Almon Chai, and Chris McCarthy. "Detecting of vertically-oriented texts in images containing natural scenes." In MobiQuitous 2020-17th EAI International Conference on Mobile and Ubiquitous Systems: Computing, Networking and Services, pp. 444-450. 2020.

Cheng, Zhanzhan, Yangliu Xu, Fan Bai, Yi Niu, Shiliang Pu, and Shuigeng Zhou. "Aon: Towards arbitrarily-oriented text recognition." In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 5571-5579. 2018.

Praneel, AS Venkata, and T. Srnivasa Rao. "Gated Dual Adaptive Attention Mechanism with Semantic Reasoning, Character Awareness, and Visual-Semantic Ensemble Fusion Decoder for Text Recognition in Natural Scene Images." International Journal of Intelligent Systems and Applications in Engineering 12, no. 1 (2024): 221-234.

Venkata Praneel, A. S., T. Srinivasa Rao, and M. Ramakrishna Murty. "A survey on accelerating the classifier training using various boosting schemes within cascades of boosted ensembles." In Intelligent Manufacturing and Energy Sustainability: Proceedings of ICIMES 2019, pp. 809-825. Springer Singapore, 2020..

Karatzas, Dimosthenis, Faisal Shafait, Seiichi Uchida, Masakazu Iwamura, Lluis Gomez i Bigorda, Sergi Robles Mestre, Joan Mas, David Fernandez Mota, Jon Almazan Almazan, and Lluis Pere De Las Heras. "ICDAR 2013 robust reading competition." In 2013 12th international conference on document analysis and recognition, pp. 1484-1493. IEEE, 2013.

Karatzas, Dimosthenis, Lluis Gomez-Bigorda, Anguelos Nicolaou, Suman Ghosh, Andrew Bagdanov, Masakazu Iwamura, Jiri Matas et al. "ICDAR 2015 competition on robust reading." In 2015 13th international conference on document analysis and recognition (ICDAR), pp. 1156-1160. IEEE, 2015.

Wang, Hao, Pu Lu, Hui Zhang, Mingkun Yang, Xiang Bai, Yongchao Xu, Mengchao He, Yongpan Wang, and Wenyu Liu. "All you need is boundary: Toward arbitrary-shaped text spotting." In Proceedings of the AAAI conference on artificial intelligence, vol. 34, no. 07, pp. 12160-12167. 2020.

Qiao, Liang, Sanli Tang, Zhanzhan Cheng, Yunlu Xu, Yi Niu, Shiliang Pu, and Fei Wu. "Text perceptron: Towards end-to-end arbitrary-shaped text spotting." In Proceedings of the AAAI conference on artificial intelligence, vol. 34, no. 07, pp. 11899-11907. 2020.

Lyu, Pengyuan, Minghui Liao, Cong Yao, Wenhao Wu, and Xiang Bai. "Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes." In Proceedings of the European conference on computer vision (ECCV), pp. 67-83. 2018.

He, Tong, Zhi Tian, Weilin Huang, Chunhua Shen, Yu Qiao, and Changming Sun. "An end-to-end textspotter with explicit alignment and attention." In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 5020-5029. 2018.

Vertical Text Detection and Recognition in Natural Scene Images: A Vertical Text Classifier and Detector with Gated Dual Adaptive Attention Mechanism

Authors

Keywords:

Abstract

Downloads

References

Downloads

Published

How to Cite

Issue

Section

License

Most read articles by the same author(s)

Announcements

Information for Authors

ijisae

Information

trindex