"A Review on String-Based Text Similarity Techniques in Computational Analysis "
Keywords:
Text Similarity, String-Based Similarity, Character-Based Similarity, Term-based SimilarityAbstract
Text similarity between words, sentences, paragraphs or documents has a great significance in all the application of Natural Language Processing (NLP) like information retrieval, word sense disambiguation, machine translation, text summarization etc. In this paper researcher have presented the survey of various string based methods used to find the text similarity. All the methods come under the two broad approaches which are character-based and term-based. These days measuring the text similarity between words, lines or documents plays important role for researches in the fields related to text such as plagiarism detection, machine translation, information retrieval etc.
Downloads
References
Gomaa, W.H., Fahmy, A.A., “A Survey of Text Similarity Approaches”, International Journal of Computer Applications, Vol. 68, No. 13,pp: 13–18, 2013.
Alberto, B. , Paolo, R., Eneko A. & Gorka L. , “Plagiarism Detection across Distant Language Pairs”, In Proceedings of the 23rd International Conference on Computational Linguistics, pp 37–45, 2010.
P. Sitikhu, K. Pahi, P. Thapa and S. Shakya, "A Comparison of Semantic Similarity Methods for Maximum Human Interpretability", Artificial Intelligence for Transforming Business and Society (AITB), Kathmandu, Nepal, pp.1-4, 2019, doi: 10.1109/AITB48515.2019.8947433.
Goutam Majumder, Partha Pakray, Alexander Gelbukh, David Pinto, “Semantic Textual Similarity Methods, Tools, and Applications: A Survey”, Computación y Sistemas, Vol. 20, No. 4, pp. 647–665, 2016.
N. Shibata, Y. Kajikawa, I. Sakata, “How to measure the semantic similarities between scientific papers and patents in order to discover uncommercialized research fonts: A case study of solar cells”, In Proceedings of PICMET technology management for global economic growth, Phuket, pp. 1-6, 2010.
Jiapeng Wang and Yihong Dong , “Measurement of Text Similarity: A Survey” , Information, Vol. 11,No. 421,pp. 1-17,2020.
Jaccard, P. (1901). Étude comparative de la distribution florale dans une portion des Alpes et des Jura. Bulletin de la Société Vaudoise des Sciences Naturelles 37, 547-579.
Khuat Thanh Tung, Nguyen Duc Hung, Le Thi My Hanh, “A Comparison of Algorithms used to measure the Similarity between two documents “,International Journal of Advanced Research in Computer Engineering & Technology, Vol 4, No 4,pp. 117-1121,2015.
Dhivya Chandrasekaran and Vijay Mago, “Evolution of Semantic Similarity - A Survey”, ACM Computing Surveys (CSUR), Vol. 54, No. 2,pp. 1-37, February 2020.
Jaro, M. A. (1989). Advances in record linkage methodology as applied to the 1985 census of Tampa Florida, Journal of the American Statistical Society, vol. 84, 406, pp 414-420
Winkler W. E. (1990). String Comparator Metrics and Enhanced Decision Rules in the Fellegi-Sunter Model of Record Linkage, Proceedings of the Section on Survey Research Methods, American Statistical Association, 354–359.
Saurabh Agarwala, Aniketh Anagawadi, Ram Mohana Reddy Guddeti, “Detecting Semantic Similarity Of Documents Using Natural Language Processing”,Procedia Computer Science,Vol.189,pp. 128-135,2021, ISSN 1877-0509,https://doi.org/10.1016/j.procs.2021.05.076.
Majumder, Goutam & Pakray, Dr. Partha & Gelbukh, Alexander, “Semantic Textual Similarity Based On Uni-Gram Language Model And Lexical Taxonomy”, International Journal of Computational Linguistics and Applications, Vol. 20, No. 4, pp. 647–665, 2016.
Prakoso, D.W., Abdi, A. & Amrit, C. Short text similarity measurement methods: a review. Soft Comput 25, pp. 4699–4723 , 2021. https://doi.org/10.1007/s00500-020-05479-2
Web link accessed on16 October 2021:
https://www.geeksforgeeks.org/jaro-and-jaro-winkler-similarity/
Downloads
Published
How to Cite
Issue
Section
License

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
All papers should be submitted electronically. All submitted manuscripts must be original work that is not under submission at another journal or under consideration for publication in another form, such as a monograph or chapter of a book. Authors of submitted papers are obligated not to submit their paper for publication elsewhere until an editorial decision is rendered on their submission. Further, authors of accepted papers are prohibited from publishing the results in other publications that appear before the paper is published in the Journal unless they receive approval for doing so from the Editor-In-Chief.
IJISAE open access articles are licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. This license lets the audience to give appropriate credit, provide a link to the license, and indicate if changes were made and if they remix, transform, or build upon the material, they must distribute contributions under the same license as the original.