A Note on the String Metric for Word Similarity
Keywords:
Distance measure, Hamming distance, Levenshtein distance, String metric, Word matchingAbstract
This paper presents a string metric for measuring the similarity between words. The distance function satisfies the axioms of non-negativity, reflexivity, symmetry, and triangle inequality. A comparative study of the string metric is carried out with Hamming and Levenshtein distances for word matching task.
Downloads
References
Sven Kosub. A note on the triangle inequality for the Jaccard distance, Pattern Recognition Letters, 2019 (120): 36-38. https://doi.org/10.1016/j.patrec.2018.12.007
Tibrewal B., Chaudhury G.S., Chakraborty S., Kairi A. Rough Set-Based Feature Subset Selection Technique Using Jaccard’s Similarity Index. In: Chakraborty M., Chakrabarti S., Balas V., Mandal J. (eds) Proceedings of International Ethical Hacking Conference 2018. Advances in Intelligent Systems and Computing, vol 811. Springer, Singapore. 2019. https://doi.org/10.1007/978-981-13-1544-2_39.
Kretz T., Bönisch C., Vortisch P. Comparison of Various Methods for the Calculation of the Distance Potential Field. In: Klingsch W., Rogsch C., Schadschneider A., Schreckenberg M. (eds) Pedestrian and Evacuation Dynamics 2008. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04504-2_29.
José M. Merigó, and Anna M. Gil-Lafuente. Using the OWA Operator in the Minkowski Distance, World Academy of Science, Engineering and Technology. 2008: 21.
P. Mahalanobis. On the generalized distance in statistics Proc. Nat. Inst. Sci. India (Calcutta) 1936(2): 49–55.
Hamming, Richard W. Error detecting and error correcting codes, The Bell system technical journal.1950:147-160. DOI: 10.1002/j.1538-7305.1950.tb00463.x
Levenshtein, Vladimir. Binary codes capable of correcting spurious insertions and deletion of ones, Problems of information Transmission. 1965: 8-17.
Cohen, William, Pradeep Ravikumar, Stephen Fienberg. A comparison of string metrics for matching names and records, KDD workshop on data cleaning and object consolidation. 2003 (3).
Zhao C., Sahni S. String correction using the Damerau- Levenshtein distance, BMC bioinformatics. 2019: 1-28. https://doi.org/10.1186/s12859-019-2819-0
Fred J Damerau. A technique for computer detection and correction of spelling errors, Communications of the ACM.1964:171-176. https://doi.org/10.1145/363958.363994
Van der Loo, Mark PJ. The stringdist package for approximate string matching, R J. 6.1. 2014.
Rajaraman, Anand, and Jeffrey David Ullman. Mining of massive datasets. Cambridge University Press. 2011.
Sanil Shanker KP, Elizabeth Sherly, Jim Austin. A note on two applications of Logical Matching Strategy, Applied Artificial Intelligence. 2011: 708-720.
Carla Pires, Afonsa Cavaco & Marina Vigårio. Towards the Definition of Linguistic Metrics for Evaluating Text Readability. Journal of Quantitative Linguistics. 2017: 319-349. https://doi.org/10.1080/09296174.2017.1311448
Needleman S B and Wunsch C D. A general method applicable to the search for similarities in the amino acid sequence of two proteins, J. Mol. Biol.. 197( 48): 443–453.
Smith T F and Waterman M S. Identification of Common Molecular Subsequences, J. Mol. Bio. 1981: 195–197.
Simpson, J. A., Weiner, E. S. C., and Oxford University Press. The Oxford English Dictionary. Oxford: Clarendon Press. 1989.
Downloads
Published
How to Cite
Issue
Section
License
![Creative Commons License](http://i.creativecommons.org/l/by-sa/4.0/88x31.png)
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
All papers should be submitted electronically. All submitted manuscripts must be original work that is not under submission at another journal or under consideration for publication in another form, such as a monograph or chapter of a book. Authors of submitted papers are obligated not to submit their paper for publication elsewhere until an editorial decision is rendered on their submission. Further, authors of accepted papers are prohibited from publishing the results in other publications that appear before the paper is published in the Journal unless they receive approval for doing so from the Editor-In-Chief.
IJISAE open access articles are licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. This license lets the audience to give appropriate credit, provide a link to the license, and indicate if changes were made and if they remix, transform, or build upon the material, they must distribute contributions under the same license as the original.