An Advanced Approach to Inspect the Influence of Dataset Size on the Enactment of Datamining Processes

Authors

  • Rajashekhar Gouda C. Patil Electronics and Communication Engineering Professor, Visvesvaraya Technological University https://orcid.org/0000-0002-4352-132X
  • Praising Linijah N. L. Assistant Professor; Management Studies Karunya Institute of Technology and Sciences, Coimbatore, India https://orcid.org/0000-0002-1900-674X
  • Aniruddha Bodhankar Faculty Member, Data Science & Information Systems Dr Ambedkar Institute of Management Studies and Research Nagpur, India
  • B. Anniprincy Computer and Communication Engineering Professor, Panimalar Engineering College Chennai, Tamilnadu, INDIA https://orcid.org/0000-0002-1464-1402
  • Manuel R. Tanpoco Assistant Professor, Department of Decision Sciences and Innovation De La Salle University
  • Dhanashree Toradmalle Associate Professor, Computer Engineering K J Somiaya Institute of Technology, Mumbai. India

Keywords:

Datamining, C4.5, KEEL, Bayesian-D

Abstract

In order to organise potential donors into distinct groups based on their eligibility and level of interest, a new method is being proposed. Information extraction and categorization methods have been developed. Learning that leads to a definitive categorization, based on an assessment of the relevant true values, corresponds to these. Typically, the same large-scale clustering algorithms are employed. Advanced clustering methods are being defined, with the partitioning approach over medoids being the most commonly used to construct clusters. With each iteration, a clearer and more condensed set of cluster objects is produced in parallel with the donor search. To make the system more resilient against noise and structure, it is being defined in a way that simplifies the process of establishing clusters. The study also takes outliers into account. We evaluate the efficiency of classification algorithms by changing the number of records in the dataset from 500 to 4000, using a mix of classification algorithms and the Bayesian-D pre-processing technique implemented in the KEEL tool. We look into how different sized datasets affect training and testing classification accuracy. Experiment results show that C4.5-C fared better than the rest of the field, and that the global classification error is on average 0.00185, with a standard deviation of 0.00421, and a rate of correctly classified samples of 0.996 when the sample size is varied from 500 to 4000.

Downloads

Download data is not yet available.

References

Buczak and E. Guven, “A survey of data mining and machine learning methods for cyber security intrusion detection,” IEEE Communications Surveys & Tutorials, vol. 18, no. 2, pp. 1153–1176, 2017. [2] L. Xu, C. Jiang, J. Wang, J. Yuan, and Y. Ren, “Information security in big data: privacy and data mining,” IEEE Access, vol. 2, no. 2, pp. 1149–1176, 2017.

T. Li and L. Long, “Imaging examination and quantitative detection and analysis of gastrointestinal diseases based on data mining technology,” Journal of Medical Systems, vol. 44, no. 1, pp. 1–15, 2020. [4] C. Zuo, “Defense of computer network viruses based on data mining technology,” International Journal on Network Security, vol. 20, no. 4, pp. 805–810, 2018.

Muttakin, F.; Wang, J.-T.; Mulyanto, M.; Leu, J.-S. Evaluation of Feature Selection Methods on Psychosocial Education Data Using Additive Ratio Assessment. Electronics 2022, 11, 114.

Bałchanowski, M.; Boryczka, U. Aggregation of Rankings Using Metaheuristics in Recommendation Systems. Electronics 2022, 11, 369.

Ferilli, S. Integration Strategy and Tool between Formal Ontology and Graph Database Technology. Electronics 2021, 10, 2616.

H. Hong, P. Tsangaratos, I. Ilia, J. Liu, A.-X. Zhu, and W. Chen, “Application of fuzzy weight of evidence and data mining techniques in construction of flood susceptibility map of Poyang County, China,” %e Science of the Total Environment, vol. 625, no. 1, pp. 575–588, 2018.

J. B. Varley, A. Miglio, V.-A. Ha, M. J. van Setten, G.-M. Rignanese, and G. Hautier, “High-throughput design of non-oxide p-type transparent conducting materials: data mining, search strategy, and identification of boron phosphide,” Chemistry of Materials, vol. 29, no. 6, pp. 2568–2573, 2017.

Issad, H. A., Aoudjit, R., & Rodrigues, J. J. (2019). A comprehensive review of Data Mining techniques in smart agriculture. Engineering in Agriculture, Environment and Food, 12(4), 511-525.

Peña-Ayala, A. (2014). Educational data mining: A survey and a data mining-based analysis of recent works. Expert systems with applications, 41(4), 1432-1462.

Rodrigues, M. W., Isotani, S., & Zarate, L. E. (2018). Educational Data Mining: A review of evaluation process in the e-learning. Telematics and Informatics, 35(6), 1701-1717.

Thalheim, B. (2013). Entity-relationship modeling: foundations of database technology. Springer Science & Business Media.

W. A. N. G. Zhao-Yi, H. U. A. N. G. Zheng-De, Y. A. N. G. Ping, R. E. N. Ting, and L. I. Xin-Hui, “Regularity of wind-dispelling medication prescribed by li dong-yuan: a data mining technology-based study,” Digital Chinese Medicine, vol. 3, no. 1, pp. 20–33, 2020.

L. Ogiela, M. R. Ogiela, and H. Ko, “Intelligent data management and security in cloud computing,” Sensors, vol. 20, no. 12, p. 3458, 2020.

S. Fatima, K. C. Desouza, J. S. Denford, and G. S. Dawson, “What explains governments interest in artificial intelligence? A signaling theory approach,” Economic Analysis and Policy , vol. 71, no. 4, pp. 238–254, 2021.

Y. Han and D. Yong, “A hybrid intelligent model for the assessment of critical success factors in high-risk emergency system,” Journal of Ambient Intelligence and Humanized Computing, vol. 9, no. 6, pp. 1–21, 2018.

S. J. Pan and Q. Yang. A survey on transfer learning. IEEE Trans. Knowledge and Data Engineering, 22:1345–1359, 2010.

L. De Raedt, T. Guns, and S. Nijssen. Constraint programming for data mining and machine learning. In Proc. 2010 AAAI Conf. Artificial Intelligence (AAAI’10), pp. 1671– 1675, Atlanta, GA, July 2010.

C. Vilasi, “Intelligence, globalization, complex and multilevel society,” Open Journal of Political Science, vol. 8, no. 1, pp. 47–56, 2018.

T. Arisetty and S. Manikandaswamy, “Intelligent driver assitance for vehichle safety,” International ournal of computational intelligence research, vol. 13, no. 9, pp. 2189–2195, 2017.

M. Ahmadi, S. Jafarzadeh-Ghoushchi, R. Taghizadeh, and A. Sharifi, “Presentation of a new hybrid approach for forecasting economic growth using artificial intelligence approaches,” Neural Computing & Applications, vol. 31, no. 12, pp. 8661–8680, 2019.

Lapatinas and A. Litina, “Intelligence and economic sophistication,” Empirical Economics, vol. 57, no. 5, pp. 1731–1750, 2019.

G. A. Montes and B. Goertzel, “Distributed, decentralized, and democratized artificial intelligence,” Technological Forecasting and Social Change, vol. 141, pp. 354–358, 2019.

Y. Sun, J. Tang, J. Han, M. Gupta, and B. Zhao. Community evolution detection in dynamic heterogeneous information networks. In Proc. 2010 KDD Workshop Mining and Learning with Graphs (MLG’10), Washington, DC, July 2010.

Uppal, A. ., Naruka, M. S. ., & Tewari, G. . (2023). Image Processing based Plant Disease Detection and Classification . International Journal on Recent and Innovation Trends in Computing and Communication, 11(1s), 52–56. https://doi.org/10.17762/ijritcc.v11i1s.5993

Ms. Madhuri Zambre. (2012). Performance Analysis of Positive Lift LUO Converter . International Journal of New Practices in Management and Engineering, 1(01), 09 - 14. Retrieved from http://ijnpme.org/index.php/IJNPME/article/view/3

Downloads

Published

11.07.2023

How to Cite

C. Patil, R. G. ., N. L., P. L. ., Bodhankar, A. ., Anniprincy, B. ., Tanpoco, M. R. ., & Toradmalle, D. . (2023). An Advanced Approach to Inspect the Influence of Dataset Size on the Enactment of Datamining Processes. International Journal of Intelligent Systems and Applications in Engineering, 11(8s), 338–345. Retrieved from https://ijisae.org/index.php/IJISAE/article/view/3057

Most read articles by the same author(s)