A Note on the String Metric for Word Similarity

Authors

  • Sanil Shanker K. P., Megha Narayanan, Arunodhaya K. Nambiar

Keywords:

Distance measure, Hamming distance, Levenshtein distance, String metric, Word matching

Abstract

This paper presents a string metric for measuring the similarity between words. The distance function satisfies the axioms of non-negativity, reflexivity, symmetry, and triangle inequality. A comparative study of the string metric is carried out with Hamming and Levenshtein distances for word matching task.         

Downloads

Download data is not yet available.

References

Sven Kosub. A note on the triangle inequality for the Jaccard distance, Pattern Recognition Letters, 2019 (120): 36-38. https://doi.org/10.1016/j.patrec.2018.12.007

Tibrewal B., Chaudhury G.S., Chakraborty S., Kairi A. Rough Set-Based Feature Subset Selection Technique Using Jaccard’s Similarity Index. In: Chakraborty M., Chakrabarti S., Balas V., Mandal J. (eds) Proceedings of International Ethical Hacking Conference 2018. Advances in Intelligent Systems and Computing, vol 811. Springer, Singapore. 2019. https://doi.org/10.1007/978-981-13-1544-2_39.

Kretz T., Bönisch C., Vortisch P. Comparison of Various Methods for the Calculation of the Distance Potential Field. In: Klingsch W., Rogsch C., Schadschneider A., Schreckenberg M. (eds) Pedestrian and Evacuation Dynamics 2008. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04504-2_29.

José M. Merigó, and Anna M. Gil-Lafuente. Using the OWA Operator in the Minkowski Distance, World Academy of Science, Engineering and Technology. 2008: 21.

P. Mahalanobis. On the generalized distance in statistics Proc. Nat. Inst. Sci. India (Calcutta) 1936(2): 49–55.

Hamming, Richard W. Error detecting and error correcting codes, The Bell system technical journal.1950:147-160. DOI: 10.1002/j.1538-7305.1950.tb00463.x

Levenshtein, Vladimir. Binary codes capable of correcting spurious insertions and deletion of ones, Problems of information Transmission. 1965: 8-17.

Cohen, William, Pradeep Ravikumar, Stephen Fienberg. A comparison of string metrics for matching names and records, KDD workshop on data cleaning and object consolidation. 2003 (3).

Zhao C., Sahni S. String correction using the Damerau- Levenshtein distance, BMC bioinformatics. 2019: 1-28. https://doi.org/10.1186/s12859-019-2819-0

Fred J Damerau. A technique for computer detection and correction of spelling errors, Communications of the ACM.1964:171-176. https://doi.org/10.1145/363958.363994

Van der Loo, Mark PJ. The stringdist package for approximate string matching, R J. 6.1. 2014.

Rajaraman, Anand, and Jeffrey David Ullman. Mining of massive datasets. Cambridge University Press. 2011.

Sanil Shanker KP, Elizabeth Sherly, Jim Austin. A note on two applications of Logical Matching Strategy, Applied Artificial Intelligence. 2011: 708-720.

Carla Pires, Afonsa Cavaco & Marina Vigårio. Towards the Definition of Linguistic Metrics for Evaluating Text Readability. Journal of Quantitative Linguistics. 2017: 319-349. https://doi.org/10.1080/09296174.2017.1311448

Needleman S B and Wunsch C D. A general method applicable to the search for similarities in the amino acid sequence of two proteins, J. Mol. Biol.. 197( 48): 443–453.

Smith T F and Waterman M S. Identification of Common Molecular Subsequences, J. Mol. Bio. 1981: 195–197.

Simpson, J. A., Weiner, E. S. C., and Oxford University Press. The Oxford English Dictionary. Oxford: Clarendon Press. 1989.

Downloads

Published

20.06.2024

How to Cite

Sanil Shanker K. P. (2024). A Note on the String Metric for Word Similarity. International Journal of Intelligent Systems and Applications in Engineering, 12(4), 527–531. Retrieved from https://ijisae.org/index.php/IJISAE/article/view/6254

Issue

Section

Research Article