A Breakthrough in Anomaly Detection using Variational Auto-encoders and Enhanced Clustering Technique Using Elevating Spam Review Detection
Keywords:
DC – VAEs, H – DBSCAN, Pytorch, Spam Detection, Cyber Security, Anamoly DetectionAbstract
This research presents pioneering techniques aimed at revolutionizing the field of anomaly detection, with a specific focus on the critical task of identifying spam reviews within textual data. In a world where user-generated content is prolific and indispensable, the need for robust spam review detection mechanisms is more pressing than ever. Our approach represents a significant leap forward in addressing this challenge. At the core of our methodology are two novel techniques: Deep Convolutional variational Auto-encoders (DC-VAEs) for feature extraction and Hierarchical Density-Based Clustering (H-DBSCAN) for enhanced clustering. DC-VAEs, implemented using the PyTorch framework, enable the extraction of intricate and context-aware features from textual data. By harnessing the inherent power of convolutional neural networks, DC-VAEs excel in capturing subtle patterns, nuances, and anomalies that often elude traditional methods. Complementing the feature extraction process of DC-VAEs is our innovative use of H-DBSCAN, implemented in Python, which offers a robust hierarchical clustering framework. This method excels in segregating legitimate reviews from spam, exhibiting a high degree of accuracy. The hierarchical nature of H-DBSCAN enables the identification of clusters at multiple granularity levels, allowing for a nuanced understanding of the data distribution and anomaly patterns. Extensive experimentation across diverse real-world datasets validates the effectiveness of our approach. Notably, our techniques consistently outperform conventional methods, yielding a groundbreaking achievement in the realm of spam review detection. This research signifies a significant advancement in the state-of-the-art for anomaly detection within textual data. Moreover, the implications of our findings extend beyond spam review identification. The combination of DC-VAEs and H-DBSCAN has demonstrated its potential as a formidable tool in various domains where precise anomaly detection holds paramount importance. This includes fields such as fraud detection, cybersecurity, and quality control, where our techniques can be adapted to uncover hidden anomalies and enhance decision-making processes. Thus, our research not only contributes substantially to the enhancement of spam review identification but also opens up new avenues for advancing anomaly detection techniques in diverse applications.
Downloads
References
Hussain, N., Turab Mirza, H., Rasool, G., Hussain, I., & Kaleem, M. (2019). Spam review detection techniques: A systematic literature review. Applied Sciences, 9(5), 987.
Hussain, N., Mirza, H. T., Hussain, I., Iqbal, F., & Memon, I. (2020). Spam review detection using the linguistic and spammer behavioral methods. IEEE Access, 8, 53801-53816.
Li, A., Qin, Z., Liu, R., Yang, Y., & Li, D. (2019, November). Spam review detection with graph convolutional networks. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management (pp. 2703-2711).
Shahariar, G. M., Biswas, S., Omar, F., Shah, F. M., & Hassan, S. B. (2019, October). Spam review detection using deep learning. In 2019 IEEE 10th Annual Information Technology, Electronics and Mobile Communication Conference (IEMCON) (pp. 0027-0033). IEEE.
Neisari, A., Rueda, L., & Saad, S. (2021). Spam review detection using self-organizing maps and convolutional neural networks. Computers & security, 106, 102274.
Saumya, S., & Singh, J. P. (2022). Spam review detection using LSTM autoencoder: an unsupervised approach. Electronic Commerce Research, 22(1), 113-133.
Bhuvaneshwari, P., Rao, A. N., & Robinson, Y. H. (2021). Spam review detection using self-attention based CNN and bi-directional LSTM. Multimedia Tools and Applications, 80, 18107-18124.
Rao, S., Verma, A. K., & Bhatia, T. (2021). A review on social spam detection: challenges, open issues, and future directions. Expert Systems with Applications, 186, 115742.
Tang, X., Qian, T., & You, Z. (2020). Generating behavior features for cold-start spam review detection with adversarial learning. Information Sciences, 526, 274-288.
Pandey, A. C., & Rajpoot, D. S. (2019). Spam review detection using spiral cuckoo search clustering method. Evolutionary Intelligence, 12(2), 147-164.
Asghar, M. Z., Ullah, A., Ahmad, S., & Khan, A. (2020). Opinion spam detection framework using hybrid classification scheme. Soft computing, 24, 3475-3498.
Liu, Y., Pang, B., & Wang, X. (2019). Opinion spam detection by incorporating multimodal embedded representation into a probabilistic review graph. Neurocomputing, 366, 276-283.
Barreno, M., Nelson, B., Joseph, A. D., Rubinstein, B. I. P., Sears, R., Tygar, J. D., ... & Ristenpart, T. (2010). "The security of machine learning." Machine Learning, 81(2), 121-148.
Kingma, D. P., & Welling, M. (2013). "Auto-Encoding Variational Bayes." arXiv preprint arXiv:1312.6114.
Radford, A., Metz, L., & Chintala, S. (2015). "Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks."
Ester, M., Kriegel, H. P., Sander, J., & Xu, X. (1996). "A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise."
Schölkopf, B., Platt, J. C., Shawe-Taylor, J., Smola, A. J., & Williamson, R. C. (2001). "Estimating the Support of a High-Dimensional Distribution."
Tang, J., Zhang, J., Yao, L., Li, J., Zhang, L., & Su, Z. (2008). "ArnetMiner: Extraction and Mining of Academic Social Networks."
Guha, S., Rastogi, R., & Shim, K. (2001). "CURE: An Efficient Clustering Algorithm for Large Databases."
Liu, F. T., Ting, K. M., & Zhou, Z. (2008). "Isolation Forest." In 2008 Eighth IEEE International Conference on Data Mining.
Chen, X., Xu, Y., & Yang, J. (2016). "Spam Review Detection with Graph-Based Propagation Model." In Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval.
Ramaswamy, S., Rastogi, R., & Shim, K. (2000). "Efficient algorithms for mining outliers from large data sets." In Proceedings of the 2000 ACM SIGMOD international conference on Management of data.
Downloads
Published
How to Cite
Issue
Section
License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
All papers should be submitted electronically. All submitted manuscripts must be original work that is not under submission at another journal or under consideration for publication in another form, such as a monograph or chapter of a book. Authors of submitted papers are obligated not to submit their paper for publication elsewhere until an editorial decision is rendered on their submission. Further, authors of accepted papers are prohibited from publishing the results in other publications that appear before the paper is published in the Journal unless they receive approval for doing so from the Editor-In-Chief.
IJISAE open access articles are licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. This license lets the audience to give appropriate credit, provide a link to the license, and indicate if changes were made and if they remix, transform, or build upon the material, they must distribute contributions under the same license as the original.