Fuzzy-Based Event Clustering for Semantic Load Shedding of Real-Time Data Streaming

Authors

  • Shubham Vyas Ph.D. Research Scholar, Amity Institute of Information Technology, Gurugram, Haryana, India
  • Rajesh Kumar Tyagi Ph.D., Professor - Department of Computer Science and Engineering, Amity School of Engineering and Technology, Gurgaon, Haryana, India,
  • Shashank Sahu Ph.D., Professor - Department of Computer Science & Engineering, Ajay Kumar Garg Engineering College, Ghaziabad, India
  • Rajesh Kumar Tyagi Ph.D., Professor - Department of Computer Science and Engineering, Amity School of Engineering and Technology, Gurgaon, Haryana, India,

Keywords:

Semantic Load Shedding, Fuzzy Clustering, K Nearest Neighbor (KNN), Apache Kafka, Real-Time Data Streaming

Abstract

In real-time data stream processing, load shedding is used to manage data overload. Fuzzy-based event grouping and load shedding optimize Apache Kafka's performance in this study. This study presents a hybrid load-shedding strategy with high recall rates that retains the throughput and cost models needed to calculate the value of matched events to shed. It also shows that deleting a constant fraction of input events can reduce latency without losing recall. The study also shows that state-based methods had the highest recall rates and input-based procedures the highest throughput. As time slices become more significant, the hybrid technique, which employs four or more slices, is best for high recall rates and acceptable throughput. These findings can enhance machine learning algorithms and load-shedding tactics for many applications. This study is dynamic and will test the method's flexibility by employing automated algorithms to determine the system's ideal sampling rate. Workload, data flow, and resources comprise this environment

Downloads

Download data is not yet available.

References

N. Tatbul and S. Zdonik, “Window-aware load shedding for aggregation queries over data streams” in VLDB, vol. 6, 2006, pp. 799-810.

R. Guo et al., “Bioinformatics applications on Apache Spark,” GigaScience, vol. 7, no. 8, p. giy098, 2018. (doi:10.1093/gigascience/giy098).

R. Shree et al., “KAFKA: The modern platform for data management and analysis in the big data domain” in 2nd international conference on telecommunication and networks (TEL-NET). IEEE, 2017, pp. 1-5. (doi:10.1109/TEL-NET.2017.8343593).

A. Floratou et al., “Dhalion: Self-regulating stream processing in heron,”, Proc. VLDB Endow., vol. 10, no. 12, pp. 1825-1836, 2017. (doi:10.14778/3137765.3137786).

G. Van Dongen and D. Van den Poel, “Evaluation of stream processing frameworks,” IEEE Trans. Parallel Distrib. Syst., vol. 31, no. 8, pp. 1845-1858, 2020. (doi:10.1109/TPDS.2020.2978480).

P. Le Noac’H et al., “A performance evaluation of Apache Kafka in support of big data streaming applications” in IEEE International Conference on Big Data (Big Data). IEEE, 2017, pp. 4803-4806. (doi:10.1109/BigData.2017.8258548).

B. R. Hiraman et al., “A study of Apache Kafka in big data stream processing” in International Conference on Information, Communication, Engineering and Technology (ICICET). IEEE, 2018, pp. 1-3. (doi:10.1109/ICICET.2018.8533771).

K. M. Thein, “Me. ‘Apache Kafka: next generation distributed messaging system.’,” Int. J. Sci. Eng. Technol. Research, vol. 3, no. 47, pp. 9478-9483, 2014.

Y. Chen et al., “Fast density peak clustering for large scale data based on kNN,” Knowl. Based Syst., vol. 187, p. 104824, 2020. (doi:10.1016/j.knosys.2019.06.032).

B. Mozafari and C. Zaniolo, “Optimal load shedding with aggregates and mining queries” in 26th International Conference on Data Engineering (ICDE 2010). IEEE. IEEE, 2010, pp. 76-88. (doi:10.1109/ICDE.2010.5447867).

B. Zhao et al., “Eires: Efficient integration of remote data in event stream processing” in Proc. 2021 International Conference on Management of Data, 2021, pp. 2128-2141. (doi:10.1145/3448016.3457304).

J. Bang et al., “Design and implementation of a load shedding engine for solving starvation problems in Apache Kafka” in Noms IEEE/IFIP Network Operations and Management Symposium, vol. 2018. IEEE, 2018, pp. 1-4. (doi:10.1109/NOMS.2018.8406306).

C. Basaran et al., “Adaptive load shedding via fuzzy control in data stream management systems” in Fifth IEEE International Conference on Service-Oriented Computing and Applications (SOCA). IEEE, 2012, pp. 1-8. (doi:10.1109/SOCA.2012.6449438).

X. Wang et al., “Fuzzy-clustering and fuzzy network based interpretable fuzzy model for prediction,” Sci. Rep., vol. 12, no. 1, p. 16279, 2022. (doi:10.1038/s41598-022-20015-y).

X. Liu et al., “Fuzzy clustering with semantic interpretation,” Appl. Soft Comput., vol. 26, pp. 21-30, 2015. (doi:10.1016/j.asoc.2014.09.037).

J. Xie et al., “Research on efficient fuzzy clustering method based on local fuzzy granular balls,” Arxiv e-Prints, 2023: arXiv-2303.

Y. Mi et al., “Fuzzy-based concept learning method: Exploiting data with fuzzy conceptual clustering,” IEEE Trans. Cybern., vol. 52, no. 1, pp. 582-593, 2022. (doi:10.1109/TCYB.2020.2980794).

B. Hayat et al., “A study on fuzzy logic-based cloud computing,” Clust. Comput., vol. 21, no. 1, pp. 589-603, 2018. (doi:10.1007/s10586-017-0953-x).

P. Maratha and K. Gupta, “Linear optimization and fuzzy-based clustering for WSNs assisted internet of things,” Multimedia Tool. Appl., vol. 82, no. 4, pp. 5161-5185, 2023. (doi:10.1007/s11042-021-11850-8).

B. Mozafari et al., “SnappyData: A unified cluster for streaming, transactions and interactive analytics” in CIDR, vol. 17, 2017, pp. 8-11.

N. Rivetti et al., “Load-aware shedding in stream processing systems” in Proc. 10th ACM International Conference on Distributed and Event-Based Systems, 2016, pp. 61-68. (doi:10.1145/2933267.2933311).

K. Tang et al., “DRS+: Load Shedding Meets Resource Auto-Scaling in Distributed Stream Processing” 18th International Conference on Smart City; IEEE 6th International Conference on Data Science and Systems (HPCC/SmartCity/DSS). IEEE. IEEE, 2020, pp. 292-301. (doi:10.1109/HPCC-SmartCity-DSS50907.2020.00036).

H.-Y. Wang et al., “A survey of fuzzy clustering validity evaluation methods,” Inf. Sci., vol. 618, 270-297, 2022. (doi:10.1016/j.ins.2022.11.010).

S. K. Jha et al., “A hybrid machine learning approach of fuzzy-rough-k-nearest neighbor, latent semantic analysis, and ranker search for efficient disease diagnosis,” J. Intell. Fuzzy Syst., vol. 42, no. 3, pp. 2549-2563, 2022. (doi:10.3233/JIFS-211820).

Downloads

Published

05.12.2023

How to Cite

Vyas, S. ., Tyagi, R. K. ., Sahu, S. ., & Tyagi, R. K. . (2023). Fuzzy-Based Event Clustering for Semantic Load Shedding of Real-Time Data Streaming. International Journal of Intelligent Systems and Applications in Engineering, 12(7s), 521–528. Retrieved from https://ijisae.org/index.php/IJISAE/article/view/4155

Issue

Section

Research Article

Similar Articles

You may also start an advanced similarity search for this article.