An Optimized Integer Representation through a Novel Numeric Encoding for Textual Data Compression

Authors

  • Kanak Pandit, Harshali Patil, Poonam Joshi,Tarunima Mukherjee

Keywords:

Burrows-Wheeler Transform, Elias Delta Code, Elias Gamma Code, Golomb Code, Numeric Encoding

Abstract

The objective of this paper is to introduce a new variable sized integer encoding technique for file compression. The paper aims to compare the performance of the proposed method with established codes like Elias Gamma, Elias Delta, and Golomb. The study also seeks to examine the impact of varying log base values on compression ratio and runtime efficiency. The proposed method utilizes radix conversion and the Burrows Wheeler Transform for file compression. Performance comparison is conducted on the Calgary corpus, which includes both text and binary files. Existing codes like Elias Gamma, Elias Delta, and Golomb are executed on the files before evaluating the proposed code. Graphs are used to analyze the impact of log base values on compression ratio, while runtime efficiency is assessed. The proposed compression code achieves varied compression ratios (1.67 to 1.87) at radix r=4, highlighting its effectiveness over existing algorithms. A non-linear relationship between the log base and compression ratio is observed, plateauing as the log base increases. Runtime varies among files, with 'bib1' at the longest time (6.41 seconds) and 'obj1' the shortest (0.09 seconds). A positive correlation exists between the number of data points (n) and runtime, while a negative correlation is seen between 'n' and compression ratio, indicating lower ratios for larger 'n' files. Comparing its performance with established codes provides a benchmark for evaluation. Analyzing compression ratio trends and runtime efficiency offers insights into the effectiveness of the proposed method, adding to its novelty.

Downloads

Download data is not yet available.

References

Uthayakumar Jayasankar, Vengattaraman Thirumal, Dhavachelvan Ponnurangam, A survey on data compression techniques: From the perspective of data quality, coding schemes, data type and applications, Journal of King Saud University - Computer and Information Sciences, Volume 33, Issue 2, 2021, Pages 119-140, ISSN 1319-1578, https://doi.org/10.1016/j.jksuci.2018.05.006.

Tania Banerjee, Jong Choi, Jaemoon Lee, Qian Gong, Jieyang Chen, Scott Klasky, Anand Rangarajan, Sanjay Ranka: “Scalable Hybrid Learning Techniques for Scientific Data Compression”, 2022. http://arxiv.org/abs/2212.10733 arXiv:2212.10733.

Elakkiya, S., Thivya, K.S. Comprehensive Review on Lossy and Lossless Compression Techniques. J. Inst. Eng. India Ser. B 103, 1003–1012 (2022). https://doi.org/10.1007/s40031-021-00686-3.

A. Gopinath and M. Ravisankar, "Comparison of Lossless Data Compression Techniques," 2020 International Conference on Inventive Computation Technologies (ICICT), Coimbatore, India, 2020, pp. 628-633, doi: 10.1109/ICICT48043.2020.9112516.

Congero, Spencer, and Kenneth Zeger. Competitive Advantage of Huffman and Shannon-Fano Codes. 2023.https://ar5iv.labs.arxiv.org/html/2311.07009.

Rowley, Jamie. “Run-Length Encoding in Data Compression.” Endless Compression, 28 Nov. 2022, www.endlesscompression.com/encoding-data-compression/. Accessed 20 Feb. 2024.

Addepalli, Phani & Lakshmi, P.V.. (2021). An Efficient Lossless Medical Data Compression using LZW compressionfor OptimalCloud Data Storage. 25. 17144-17160. https://www.researchgate.net/publication/353514407.

Kumari, B., Kamal, N.K., Sattar, A.M., & Ranjan, M.K. (2023). Adaptive Huffman Algorithm for Data Compression Using Text Clustering and Multiple Character Modification. RECENT TRENDS IN PROGRAMMING LANGUAGES. DOI:10.37591/rtpl.v10i1.509.

Anis Suliman Ali Bakouri, "TIFF Image Compression through Huffman Coding Technique", International Journal of Science and Research (IJSR), Volume 11 Issue 10, October 2022, pp. 277-279, https://www.ijsr.net/getabstract.php?paperid=SR22929233828.

Virendra Nikam, Sheetal Dhande. (2023). A Historical Perspective on Approaches to Data Compression. Mathematics and Computer Science, 8(3), 68-72. https://doi.org/10.11648/j.mcs.20230803.11.

Manikandan VM, Murthy KSR, Siddineni B, Victor N, Maddikunta PKR, Hakak S. A High-Capacity Reversible Data-Hiding Scheme for Medical Image Transmission Using Modified Elias Gamma Encoding. Electronics. 2022; 11(19):3101. https://doi.org/10.3390/electronics11193101.

Fante, Kinde & Bhaumik, Basabi. (2022). Low-Power Endoscopic Image Compression Algorithms Using Modified Golomb Codes. 10.1007/978-981-16-2123-9_5.

Rahman, Md. (2020). Burrows–Wheeler Transform Based Lossless Text Compression Using Keys and Huffman Coding. Symmetry. 12. 10.3390/sym12101654.

Nelson Raja, J., Jaganathan, P., & Domnic, S. (2015). A New Variable-Length Integer Code for Integer Representation and Its Application to Text Compression. In Indian Journal of Science and Technology (Vol. 8, Issue 24). Indian Society for Education and Environment. https://doi.org/10.17485/ijst/2015/v8i24/80242.

Hariska, Elvia & Yuliani, Ega & Nasution, Surya. (2021). Performance Comparison Analysis of the Elias Delta Code Algorithm with the Even Rodeh Code Algorithm for Compressing Image Files. The IJICS (International Journal of Informatics and Computer Science). 5. 29. 10.30865/ijics.v5i1.2888.

S. Kalaivani, C. Tharini, Analysis and implementation of novel Rice Golomb coding algorithm for wireless sensor networks, Computer Communications, Volume 150, 2020, Pages 463-471, ISSN 0140-3664, https://doi.org/10.1016/j.comcom.2019.11.046.

Hassan N. Noura, Joseph Azar, Ola Salman, Raphaël Couturier, and Kamel Mazouzi. 2023. A deep learning scheme for efficient multimedia IoT data compression. Ad Hoc Netw. 138, C (Jan 2023). https://doi.org/10.1016/j.adhoc.2022.102998.

Downloads

Published

26.03.2024

How to Cite

Harshali Patil, Poonam Joshi,Tarunima Mukherjee, K. P. . (2024). An Optimized Integer Representation through a Novel Numeric Encoding for Textual Data Compression. International Journal of Intelligent Systems and Applications in Engineering, 12(21s), 374–379. Retrieved from https://ijisae.org/index.php/IJISAE/article/view/5433

Issue

Section

Research Article