Text Simplification Improves Text Translation from Gujarati Regional Language to English: An Experimental Study
Keywords:
Text Simplification, Text Translation, Text Readability, Natural Language Processing, Indian LanguageAbstract
Text translation plays an important role in increasing the reach of information technology to the large portion of the population. The text translation helps to overcome the language barrier. An adequately translated and simplified text helps to improve the quality of communication. In recent times many researchers proposed research work on text translation. However, the grammar and complex formation of regional languages are bottlenecks in effective text translation. Many researchers proposed text simplification before text translation. The text simplification improves the readability and understandability of the text. However, regional language simplification and translation is still a challenging task for the researchers. Gujarati is an Indian regional language. In this paper, an experimental setup is proposed for improved text translation from Gujarati language to English language. Results show that the text simplification improves the quality of translation. We also experimented with text translation of the Indian national language - Hindi showed an improvement in translation results.
Downloads
References
B. B. CK Bhensdadia Pushpak Bhattacharyya, “Introduction to Gujarati wordnet,” Third Natl. Workshop Indowordnet Proc., vol. 494, 2002.
C. Boitet, “The French National MT-Project: Technical organization and translation results of CALLIOPE-AERO,” Comput. Transl., vol. 1, no. 4, pp. 239–267, 1986, doi: 10.1007/BF00936424.
L. Feng, “Text simplification: A survey,” City Univ. N. Y. Tech Rep, pp. 7–23, 2008. Hautli-Janisz, “Pushpak Bhattacharyya: Machine translation,” Mach. Transl., vol. 29, no. 3–4, pp. 285–289, Dec. 2015, doi: 10.1007/s10590-015-9170-7.
G. V. Garje and G. K. Kharate, “Survey of Machine Translation Systems in India,” Int. J. Nat. Lang. Comput., vol. 2, no. 5, pp. 47–67, Oct. 2013, doi: 10.5121/ijnlc.2013.2504.
L. Feng, “Text Simplification: A Survey,” p. 35.
W. Contributors, “Gujarati Language,” Definitions, 2020. https://en.wikipedia.org/w/index.php?title=Gujarati_language&oldid=962021892 (accessed Jun. 08, 2020).
Wikipedia contributors, “Hindi Language,” in Definitions, Qeios, 2020. doi: 10.32388/W2U5JG.
R. Chandrasekar, C. Doran, and B. Srinivas, “Motivations and methods for text simplification,” in Proceedings of the 16th conference on Computational linguistics -, Morristown, NJ, USA, 1996, vol. 2, p. 1041. doi: 10.3115/993268.993361.
S. SPanchal, P. P Shukla, P. R Panchal, J. S Kolte, and B. H N, “Gujarati WordNet A Lexical Database,” Int. J. Comput. Appl., vol. 116, no. 20, pp. 6–8, 2015, doi: 10.5120/20450-2803.
C. Callison-Burch, P. Koehn, and M. Osborne, “Improved Statistical Machine Translation Using Paraphrases,” in Proceedings of the Human Language Technology Conference of the NAACL, Main Conference, New York City, USA, Jun. 2006, pp. 17–24. Accessed: Aug. 21, 2022. [Online]. Available: https://aclanthology.org/N06-1003
S. Mirkin, “Confidence-driven Rewriting for Improved Translation,” Sep. 2013, Accessed: Aug. 21, 2022. [Online]. Available: https://www.academia.edu/4090244/Confidence_driven_Rewriting_for_Improved_Translation
W. Aziz, M. Dymetman, L. Specia, and S. Mirkin, “Learning an Expert from Human Annotations in Statistical Machine Translation: the Case of Out-of-Vocabulary Words,” Saint Raphaël, France, May 2010. Accessed: Aug. 21, 2022. [Online]. Available: https://aclanthology.org/2010.eamt-1.31
S. Tyagi, D. Chopra, I. Mathur, and N. Joshi, “Classifier based text simplification for improved machine translation,” in 2015 International Conference on Advances in Computer Engineering and Applications, Mar. 2015, pp. 46–50. doi: 10.1109/ICACEA.2015.7164711.
S. Mirkin, S. Venkatapathy, M. Dymetman, and I. Calapodescu, “SORT: An Interactive Source-Rewriting Tool for Improved Translation,” in Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics: System Demonstrations, Sofia, Bulgaria, Aug. 2013, pp. 85–90. Accessed: Aug. 21, 2022. [Online]. Available: https://aclanthology.org/P13-4015
H. Saggion, E. Gómez-Martínez, E. Etayo, A. Anula, and L. Bourg, “Text Simplification in Simplext. Making Text More Accessible,” vol. 47, Sep. 2011.
G. H. Paetzold and L. Specia, “Text Simplification as Tree Transduction,” 2013. Accessed: Aug. 21, 2022. [Online]. Available: https://aclanthology.org/W13-4813
J. Ameta, N. Joshi, and I. Mathur, “Improving the quality of Gujarati-Hindi Machine Translation through part-of-speech tagging and stemmer-assisted transliteration,” Int. J. Nat. Lang. Comput., vol. 2, no. 3, pp. 49–54, Jun. 2013, doi: 10.5121/ijnlc.2013.2305.
P. Pimpale and R. Patel, “Reordering rules for English-Hindi SMT,” Apr. 2013, Accessed: Aug. 21, 2022. [Online]. Available: https://www.academia.edu/7421948/Reordering_rules_for_English_Hindi_SMT
J. N. Farr, J. J. Jenkins, and D. G. Paterson, “Simplification of Flesch Reading Ease Formula,” J. Appl. Psychol., vol. 35, no. 5, pp. 333–337, 1951, doi: 10.1037/h0062427.
M. Solnyshkina, R. Zamaletdinov, L. A. Gorodetskaya, and A. I. Gabitov, “Evaluating Text Complexity and Flesch-Kincaid Grade Level,” J. Soc. Stud. Educ. Res., vol. 8, pp. 238–248, Nov. 2017.
“THE GUNNING FOG READABILITY FORMULA.” https://readabilityformulas.com/gunning-fog-readability-formula.php (accessed Aug. 21, 2022).
K. Papineni, S. Roukos, T. Ward, and W.-J. Zhu, “BLEU,” in Proceedings of the 40th Annual Meeting on Association for Computational Linguistics - ACL ’02, Morristown, NJ, USA, 2001, p. 311. doi: 10.3115/1073083.1073135.
Y. Zhang, S. Vogel, and A. Waibel, “Interpreting Bleu/NIST scores: How much improvement do we need to have a better system,” 2004.
Xu, W., Napoles, C., Pavlick, E., Chen, Q., & Callison-Burch, C. (2016). “Optimizing Statistical Machine Translation for Text Simplification”. Transactions of the Association for Computational Linguistics, 4, 401–415. Retrieved from https://cocoxu.github.io/publications/tacl2016-smt-simplification.pdf
Gujarati Rudhiprayog Ane Kahevat Sangrah. (n.d.). Retrieved from https://drive.google.com/uc?id=1gH7v1XoJ3f5ajsUg0Rz286LmNkKYfg2h&export=download
Ramesh, G., Doddapaneni, S., Bheemaraj, A., Jobanputra, M., Ak, R., Sharma, A., … Khapra, M. S. (2021). Samanantar: The Largest Publicly Available Parallel Corpora Collection for 11 Indic Languages. ArXiv [Cs.CL]. Retrieved from http://arxiv.org/abs/2104.05596
Bataa, B., & Altangerel, K. (2012). Word Sense Disambiguation in Gujarati Language. Proceedings - 2012 7th International Forum on Strategic Technology, IFOST 2012, (1), 44–47. doi:10.1109/IFOST.2012.6357625
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2023 Dhawal Khem, Shailesh Panchal, Chetan Bhatt
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
All papers should be submitted electronically. All submitted manuscripts must be original work that is not under submission at another journal or under consideration for publication in another form, such as a monograph or chapter of a book. Authors of submitted papers are obligated not to submit their paper for publication elsewhere until an editorial decision is rendered on their submission. Further, authors of accepted papers are prohibited from publishing the results in other publications that appear before the paper is published in the Journal unless they receive approval for doing so from the Editor-In-Chief.
IJISAE open access articles are licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. This license lets the audience to give appropriate credit, provide a link to the license, and indicate if changes were made and if they remix, transform, or build upon the material, they must distribute contributions under the same license as the original.