A Breakthrough in Anomaly Detection using Variational Auto-encoders and Enhanced Clustering Technique Using Elevating Spam Review Detection

Authors

  • Sripathi S Research Scholar, Department of Computer Science, Bharathidasan University, Tiruchirapalli, Tamilnadu, India-620024
  • Shanthi P M Assistant Professor, Department of Information Technology,JJ Arts and Science College,Puthukottai,Tamilnadu,India-622422

Keywords:

DC – VAEs, H – DBSCAN, Pytorch, Spam Detection, Cyber Security, Anamoly Detection

Abstract

This research presents pioneering techniques aimed at revolutionizing the field of anomaly detection, with a specific focus on the critical task of identifying spam reviews within textual data. In a world where user-generated content is prolific and indispensable, the need for robust spam review detection mechanisms is more pressing than ever. Our approach represents a significant leap forward in addressing this challenge. At the core of our methodology are two novel techniques: Deep Convolutional variational Auto-encoders (DC-VAEs) for feature extraction and Hierarchical Density-Based Clustering (H-DBSCAN) for enhanced clustering. DC-VAEs, implemented using the PyTorch framework, enable the extraction of intricate and context-aware features from textual data. By harnessing the inherent power of convolutional neural networks, DC-VAEs excel in capturing subtle patterns, nuances, and anomalies that often elude traditional methods. Complementing the feature extraction process of DC-VAEs is our innovative use of H-DBSCAN, implemented in Python, which offers a robust hierarchical clustering framework. This method excels in segregating legitimate reviews from spam, exhibiting a high degree of accuracy. The hierarchical nature of H-DBSCAN enables the identification of clusters at multiple granularity levels, allowing for a nuanced understanding of the data distribution and anomaly patterns. Extensive experimentation across diverse real-world datasets validates the effectiveness of our approach. Notably, our techniques consistently outperform conventional methods, yielding a groundbreaking achievement in the realm of spam review detection. This research signifies a significant advancement in the state-of-the-art for anomaly detection within textual data. Moreover, the implications of our findings extend beyond spam review identification. The combination of DC-VAEs and H-DBSCAN has demonstrated its potential as a formidable tool in various domains where precise anomaly detection holds paramount importance. This includes fields such as fraud detection, cybersecurity, and quality control, where our techniques can be adapted to uncover hidden anomalies and enhance decision-making processes. Thus, our research not only contributes substantially to the enhancement of spam review identification but also opens up new avenues for advancing anomaly detection techniques in diverse applications.

Downloads

Download data is not yet available.

References

Hussain, N., Turab Mirza, H., Rasool, G., Hussain, I., & Kaleem, M. (2019). Spam review detection techniques: A systematic literature review. Applied Sciences, 9(5), 987.

Hussain, N., Mirza, H. T., Hussain, I., Iqbal, F., & Memon, I. (2020). Spam review detection using the linguistic and spammer behavioral methods. IEEE Access, 8, 53801-53816.

Li, A., Qin, Z., Liu, R., Yang, Y., & Li, D. (2019, November). Spam review detection with graph convolutional networks. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management (pp. 2703-2711).

Shahariar, G. M., Biswas, S., Omar, F., Shah, F. M., & Hassan, S. B. (2019, October). Spam review detection using deep learning. In 2019 IEEE 10th Annual Information Technology, Electronics and Mobile Communication Conference (IEMCON) (pp. 0027-0033). IEEE.

Neisari, A., Rueda, L., & Saad, S. (2021). Spam review detection using self-organizing maps and convolutional neural networks. Computers & security, 106, 102274.

Saumya, S., & Singh, J. P. (2022). Spam review detection using LSTM autoencoder: an unsupervised approach. Electronic Commerce Research, 22(1), 113-133.

Bhuvaneshwari, P., Rao, A. N., & Robinson, Y. H. (2021). Spam review detection using self-attention based CNN and bi-directional LSTM. Multimedia Tools and Applications, 80, 18107-18124.

Rao, S., Verma, A. K., & Bhatia, T. (2021). A review on social spam detection: challenges, open issues, and future directions. Expert Systems with Applications, 186, 115742.

Tang, X., Qian, T., & You, Z. (2020). Generating behavior features for cold-start spam review detection with adversarial learning. Information Sciences, 526, 274-288.

Pandey, A. C., & Rajpoot, D. S. (2019). Spam review detection using spiral cuckoo search clustering method. Evolutionary Intelligence, 12(2), 147-164.

Asghar, M. Z., Ullah, A., Ahmad, S., & Khan, A. (2020). Opinion spam detection framework using hybrid classification scheme. Soft computing, 24, 3475-3498.

Liu, Y., Pang, B., & Wang, X. (2019). Opinion spam detection by incorporating multimodal embedded representation into a probabilistic review graph. Neurocomputing, 366, 276-283.

Barreno, M., Nelson, B., Joseph, A. D., Rubinstein, B. I. P., Sears, R., Tygar, J. D., ... & Ristenpart, T. (2010). "The security of machine learning." Machine Learning, 81(2), 121-148.

Kingma, D. P., & Welling, M. (2013). "Auto-Encoding Variational Bayes." arXiv preprint arXiv:1312.6114.

Radford, A., Metz, L., & Chintala, S. (2015). "Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks."

Ester, M., Kriegel, H. P., Sander, J., & Xu, X. (1996). "A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise."

Schölkopf, B., Platt, J. C., Shawe-Taylor, J., Smola, A. J., & Williamson, R. C. (2001). "Estimating the Support of a High-Dimensional Distribution."

Tang, J., Zhang, J., Yao, L., Li, J., Zhang, L., & Su, Z. (2008). "ArnetMiner: Extraction and Mining of Academic Social Networks."

Guha, S., Rastogi, R., & Shim, K. (2001). "CURE: An Efficient Clustering Algorithm for Large Databases."

Liu, F. T., Ting, K. M., & Zhou, Z. (2008). "Isolation Forest." In 2008 Eighth IEEE International Conference on Data Mining.

Chen, X., Xu, Y., & Yang, J. (2016). "Spam Review Detection with Graph-Based Propagation Model." In Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval.

Ramaswamy, S., Rastogi, R., & Shim, K. (2000). "Efficient algorithms for mining outliers from large data sets." In Proceedings of the 2000 ACM SIGMOD international conference on Management of data.

Downloads

Published

02.02.2024

How to Cite

S, S. ., & P M, S. . (2024). A Breakthrough in Anomaly Detection using Variational Auto-encoders and Enhanced Clustering Technique Using Elevating Spam Review Detection . International Journal of Intelligent Systems and Applications in Engineering, 12(14s), 22–31. Retrieved from https://ijisae.org/index.php/IJISAE/article/view/4630

Issue

Section

Research Article