ULODF: An Unsupervised Learning based Outlier Detection Framework in High Dimensional Data

Authors

  • C. Jayaramulu, Bondu Venkateswarlu

Keywords:

Outlier Detection, Unsupervised Learning, Outlier Detection Framework, Clustering High Dimensional Data

Abstract

Outliers play crucial role in applications like disease diagnosis, fraud detection techniques and cyber security to mention few. Unsupervised learning techniques like clustering are widely used, in the area of machine learning, towards outlier detection. However, most of the existing methods did not consider dual tasking benefits of using clustering that not only renders quality clusters but also identifies outliers effectively. We proposed a framework named Unsupervised Learning based Outlier Detection Framework (UL-ODF). An algorithm named Novel Outlier Detection Method in High Dimensional Data (NODM-HDD) is defined. The algorithm has mechanisms to improve compactness of clusters made besides determining outliers. The algorithm exploits an enhanced version of K-Means clustering technique. A prototype is built to validate the utility of the framework and the underlying algorithm. Different benchmark datasets and metrics are used in the empirical study. The experimental results revealed that the NODM-HDD shows better performance over the state of the art.

Downloads

Download data is not yet available.

References

Kumar Dwivedi, R., Pandey, S., & Kumar, R. (2018). A Study on Machine Learning Approaches for Outlier Detection in Wireless Sensor Network. 2018 8th International Conference on Cloud Computing, Data Science & Engineering (Confluence). P1-4.

Jiang, J., Han, G., liu, L., Shu, L., &Guizani, M. (2020). Outlier Detection Approaches Based on Machine Learning in the Internet-of-Things. IEEE Wireless Communications, 27(3), 53–59.

Deng, X., Jiang, P., Peng, X., &Mi, C. (2018). An Intelligent Outlier Detection Method with One Class Support Tucker Machine and Genetic Algorithm towards Big Sensor Data in Internet of Things. IEEE Transactions on Industrial Electronics, 1–11.

Inoue, J., Yamagata, Y., Chen, Y., Poskitt, C. M., & Sun, J. (2017). Anomaly Detection for a Water Treatment System Using Unsupervised Machine Learning. 2017 IEEE International Conference on Data Mining Workshops (ICDMW). P1-8.

SADGALI, I., SAEL, N., & BENABBOU, F. (2019). Performance of machine learning techniques in the detection of financial frauds. Procedia Computer Science, 148, 45–54.

Bauder, R. A., &Khoshgoftaar, T. M. (2017). Medicare Fraud Detection Using Machine Learning Methods. 2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA). P1-8.

Stetco, A., Dinmohammadi, F., Zhao, X., Robu, V., Flynn, D., Barnes, M., … Nenadic, G. (2018). Machine learning methods for wind turbine condition monitoring: A review. Renewable Energy. P1-23.

Althaf Hussain Basha, Y. Sri Lalitha “Student Performance Prediction – A Data Science Approach”, Modern

Approaches in Machine Learning and Cognitive Science: A Walkthrough Studies in Computational Intelligence,

P.115-125, 2021.

Bondu Venkateswarlu, GSV Prasada Raju, "Organ Donor Identification Through Improved K- Medoids Clustering", InternatIo Journal of Computer Science and technology (IJCST),Volume 5,Issue 3,pp.175-177, ISSN:0976-8491,2014.

Sk Althaf Hussain Basha, Ch. Prakash, D. Mounika, G. Maheetha, “An Approach for Multi Instance Clustering of Student Academic Performance in Education Domain”, IIJDWM Journal, Volume 3,Issue 1,pp.1-9,Feb.2013, ISSN: 2249-7161

Liu, Y., Li, Z., Zhou, C., Jiang, Y., Sun, J., Wang, M., & He, X. (2019). Generative Adversarial Active Learning for Unsupervised Outlier Detection. IEEE Transactions on Knowledge and Data Engineering, 1–12.

Moustafa, N., Choo, K.-K. R., Radwan, I., &Camtepe, S. (2019). Outlier Dirichlet Mixture Mechanism: Adversarial Statistical Learning for Anomaly Detection in the Fog. IEEE Transactions on Information Forensics and Security, 1–13.

Rayana, S., Zhong, W., &Akoglu, L. (2016). Sequential

Ensemble Learning for Outlier Detection: A Bias-Variance Perspective. 2016 IEEE 16th International Conference on Data Mining (ICDM). P1-6.

Chakraborty, D., Narayanan, V., & Ghosh, A. (2019). Integration of Deep Feature Extraction and Ensemble Learning for Outlier Detection. Pattern Recognition. P1-13.

Christy, A., Gandhi, G. M., &Vaithyasubramanian, S. (2015). Cluster Based Outlier Detection Algorithm for Healthcare Data. Procedia Computer Science, 50, 209–215.

Angelin, B., &Geetha, A. (2020). Outlier Detection using Clustering Techniques – K-means and K-median. 2020 4th InternationalConference on Intelligent Computing and Control Systems (ICICCS).p1-6.

Sri Lalitha Y et al (2014) “Semantic Framework for Text Clustering with Neighbors” in ICT and Critical Infrastructure: Proceedings of the 48th Annual Convention of Computer Society of India-Vol. II Advances in Intelligent Systems and Computing, P.261-271.

Y. Sri Lalitha, et al (2020) “Efficient Tumor Detection in MRI Brain Images”,in International Journal of Online and Biomedical Engineering, P. 122-131.

Malini, N., & Pushpa, M. (2017). Analysis on credit card fraud identification techniques based on KNN and outlier detection. 2017 Third International Conference on Advances in Electrical, Electronics, Information, Communication and Bio-Informatics (AEEICB). P1-4.

Liu, H., Li, X., Li, J., & Zhang, S. (2017). Efficient Outlier Detection for High-Dimensional Data. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 1–11.

Domingues, R., Filippone, M., Michiardi, P., &Zouaoui, J. (2018). A comparative evaluation of outlier detection algorithms: Experiments and analyses. Pattern Recognition, 74, 406–421.

Nesa, N., Ghosh, T., & Banerjee, I. (2018). Non-parametric sequence-based learning approach for outlier detection in IoT. Future Generation Computer Systems, 82, 412–421.

Nguyen, H., Cai, C., & Chen, F. (2017). Automatic classification of traffic incident’s severity using machine learning approaches . IET Intelligent Transport Systems, 11(10), 615–623.

Zhao, H., Liu, H., Ding, Z., & Fu, Y. (2018). Consensus Regularized Multi-View Outlier Detection. IEEE Transactions on Image Processing, 27(1), 236–248.

Bondu Venkateswarlu and Prof G.S.V.Prasad Raju. 2013. “Mine Blood Donors Information through Improved K-Means Clustering.” International Journal of Computational Science and Information Technology (IJCSITY) Vol.1,No.3, arXivpreprint arXiv:1309.2597

Souza, A. M. C., & Amazonas, J. R. A. (2015). An Outlier Detect Algorithm using Big Data Processing and Internet of Things Architecture. Procedia Computer Science, 52, 1010–1015.

Munoz-Organero, M., Ruiz-Blaquez, R., & Sánchez-Fernández, L. (2018). Automatic detection of traffic lights, street crossings and urban roundabouts combining outlier detection and deep learning classification techniques based on GPS traces while driving. Computers, Environment and Urban Systems, 68, 1–8.

Maniruzzaman, M., Rahman, M. J., Al-MehediHasan, M., Suri, H. S., Abedin, M. M., El-Baz, A., & Suri, J. S. (2018). Accurate Diabetes Risk Stratification Using Machine Learning: Role of Missing Value and Outliers. Journal of Medical Systems, 42(5). P1-17.

Ren, J., Guo, J., Qian, W., Yuan, H., Hao, X., &Jingjing, H. (2019). Building an Effective Intrusion Detection System by Using Hybrid Data Optimization Based on Machine Learning Algorithms. Security and Communication Networks, 2019, 1–11.

Erkuş, E. C., &Purutçuoğlu, V. (2020). Outlier Detection and Quasi-periodicity Optimization Algorithm: Frequency Domain Based Outlier Detection (FOD). European Journal of Operational Research. P1-19.

Sk Althaf Hussain Basha, Naga Raju Devarakonda, Shaik Subhani, “ Outliers Detection in Regression Analysis using Partial Least Square Approach”, ICT and Critical Infrastructure: proceedings of the 48th Annual Convention of Computer Society of India- Springer, Vol II Advances in Intelligent Systems and Computing, Volume 249, pp. 125-135, Visakhapatnam, December 2013,ISBN: 978-3-319-03095-1.

Varmedja, D., Karanovic, M., Sladojevic, S., Arsenovic, M., &Anderla, A. (2019). Credit Card Fraud Detection - Machine Learning methods. 2019 18th International Symposium INFOTEH-JAHORINA (INFOTEH).

Davaslioglu, K., &Sagduyu, Y. E. (2019). Trojan Attacks on Wireless Signal Classification with Adversarial Machine Learning. 2019 IEEE International Symposium on Dynamic Spectrum Access Networks (DySPAN). P1-6.

Sandosh, S., Govindasamy, V., &Akila, G. (2020). Enhanced intrusion detection system via agent clustering and classification based on outlier detection. Peer-to-Peer Networking and Applications. P1-8.

Bhatti, M. A., Riaz, R., Rizvi, S. S., Shokat, S., Riaz, F., & Kwon, S. J. (2020). Outlier detection in indoor localization and Internet of Things (IoT) using machine learning. Journal of Communications and Networks, 22(3), 236–243.

Ma, B., Yuan, L., Xu, S., Zheng, K., Huang, F., Li, R., & Yuan, P. (2020). Positive Active Power Outlier Detection based on One-Class SVM. 2020 12th IEEE PES Asia-Pacific Power and Energy Engineering Conference (APPEEC). P1-4.

Ayesha M, S.K. Althaf Hussain Basha, S. V. Raju, Y. S. Lalitha: A Brief Research on Deep Learning Models,” International Journal of Computer Engineering and Applications, Volume XIII, Issue VI, November. 20, ISSN 2321-3469, pp. 1-7.

S. Wu and S. Wang, “Information-theoretic outlier detection for large scale categorical data,” IEEE Transactions on Knowledge and Data Engineering, vol. 25, no. 3, pp. 589–602, 2013.

H. Liu, M. Shao, S. Li, and Y. Fu, “Infinite ensemble clustering,” Data Mining and Knowledge Discovery, no. 1-32, 2017. [32] M. M. Breunig, H.-P. Kriegel, R. T. Ng, and J. Sander, “Lof: identifying density-based local outliers,” in ACM sigmod record, vol. 29, no. 2, 2000, pp. 93–104.

Markus M. Breunig, Hans-Peter Kriegel, Raymond T. Ng, Jörg Sander, "LOF: Identifying Density-Based Local Outliers”, Proc. ACM SIGMOD 2000 Int. Conf. On Management of Data, Dalles, TX, pp.1-12,2000

J. Tang, Z. Chen, A. W.-C. Fu, and D. W. Cheung, “Enhancing effectiveness of outlier detections for low density patterns,” in Proceedings of Pacific-Asia Conference on Knowledge Discovery and Data Mining, 2002.

K. Zhang, M. Hutter, and H. Jin, “A new local distance-based outlier detection approach for scattered real-world data,” in Proceedings of Pacific-Asia Conference on Knowledge Discovery and Data Mining, 2009.

Hans-Peter Kriegel, Matthias Schubert, and Arthur Zimek, “Angle-based outlier detection in high-dimensional data”, In Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 444–452, 2008..

F. T. Liu, K. M. Ting, and Z.-H. Zhou, “Isolation forest,” in Proceedings of IEEE International Conference on Data Mining, 2008.

Y. Lee, Y. Yeh, and Y. Wang, “Anomaly detection via online over sampling principal component analysis,” IEEE Transactions on Knowledge and Data Engineering, vol. 25, no. 7, pp. 1460–1470, 2013.

R. Kannan, H. Woo, C. C. Aggarwal, and H. Park, “Outlier detection for text data,” in Proceedings of SIAM International Conference on Data Mining, 2017.

Sk. Althaf, H. Basha, A. Govardan, S. V. Raju and N. Sultana, "MICR: Multiple Instance Cluster Regression for Student Academic Performance in Higher Education", International Journal of Computer Applications (0975–8887), vol. 14, no. 4, January 2011.

Basha Althaf. H., Govardhan, A., Raju, S. V., & Sultana, N, “ A Comparative Analysis of Prediction Techniques for Predicting Graduate Rate of University”, European Journal of Scientific Research, 46(2), 186-193,2010.

S. Chawla and A. Gionis, “k-means-: A unified approach to clustering and outlier detection,” in Proceedings of SIAM International Conference on Data Mining, 2013.

Althaf H. B., Ramesh S. K., Kumar, Y.R., Govardhan A. & Mohd. Z. A., “Predicting Student Academic Performance Using Temporal Association Mining”, International Journal of Information Science and Education, Vol. 2, No. 1, pp. 21-41, 2012.

Samson Anosh Babu P, Chandra Sekhara Rao, Annavarapu, Suresh Dara, “Clustering-based hybrid feature selection approach for high dimensional microarray data” Chemometrics and Intelligent Laboratory Systems,Vol. 213,104305 , 2021,

Downloads

Published

26.03.2024

How to Cite

Bondu Venkateswarlu, C. J. . (2024). ULODF: An Unsupervised Learning based Outlier Detection Framework in High Dimensional Data. International Journal of Intelligent Systems and Applications in Engineering, 12(21s), 1675–1686. Retrieved from https://ijisae.org/index.php/IJISAE/article/view/5643

Issue

Section

Research Article