An Advanced Approach to Inspect the Influence of Dataset Size on the Enactment of Datamining Processes
Keywords:
Datamining, C4.5, KEEL, Bayesian-DAbstract
In order to organise potential donors into distinct groups based on their eligibility and level of interest, a new method is being proposed. Information extraction and categorization methods have been developed. Learning that leads to a definitive categorization, based on an assessment of the relevant true values, corresponds to these. Typically, the same large-scale clustering algorithms are employed. Advanced clustering methods are being defined, with the partitioning approach over medoids being the most commonly used to construct clusters. With each iteration, a clearer and more condensed set of cluster objects is produced in parallel with the donor search. To make the system more resilient against noise and structure, it is being defined in a way that simplifies the process of establishing clusters. The study also takes outliers into account. We evaluate the efficiency of classification algorithms by changing the number of records in the dataset from 500 to 4000, using a mix of classification algorithms and the Bayesian-D pre-processing technique implemented in the KEEL tool. We look into how different sized datasets affect training and testing classification accuracy. Experiment results show that C4.5-C fared better than the rest of the field, and that the global classification error is on average 0.00185, with a standard deviation of 0.00421, and a rate of correctly classified samples of 0.996 when the sample size is varied from 500 to 4000.
Downloads
References
Buczak and E. Guven, “A survey of data mining and machine learning methods for cyber security intrusion detection,” IEEE Communications Surveys & Tutorials, vol. 18, no. 2, pp. 1153–1176, 2017. [2] L. Xu, C. Jiang, J. Wang, J. Yuan, and Y. Ren, “Information security in big data: privacy and data mining,” IEEE Access, vol. 2, no. 2, pp. 1149–1176, 2017.
T. Li and L. Long, “Imaging examination and quantitative detection and analysis of gastrointestinal diseases based on data mining technology,” Journal of Medical Systems, vol. 44, no. 1, pp. 1–15, 2020. [4] C. Zuo, “Defense of computer network viruses based on data mining technology,” International Journal on Network Security, vol. 20, no. 4, pp. 805–810, 2018.
Muttakin, F.; Wang, J.-T.; Mulyanto, M.; Leu, J.-S. Evaluation of Feature Selection Methods on Psychosocial Education Data Using Additive Ratio Assessment. Electronics 2022, 11, 114.
Bałchanowski, M.; Boryczka, U. Aggregation of Rankings Using Metaheuristics in Recommendation Systems. Electronics 2022, 11, 369.
Ferilli, S. Integration Strategy and Tool between Formal Ontology and Graph Database Technology. Electronics 2021, 10, 2616.
H. Hong, P. Tsangaratos, I. Ilia, J. Liu, A.-X. Zhu, and W. Chen, “Application of fuzzy weight of evidence and data mining techniques in construction of flood susceptibility map of Poyang County, China,” %e Science of the Total Environment, vol. 625, no. 1, pp. 575–588, 2018.
J. B. Varley, A. Miglio, V.-A. Ha, M. J. van Setten, G.-M. Rignanese, and G. Hautier, “High-throughput design of non-oxide p-type transparent conducting materials: data mining, search strategy, and identification of boron phosphide,” Chemistry of Materials, vol. 29, no. 6, pp. 2568–2573, 2017.
Issad, H. A., Aoudjit, R., & Rodrigues, J. J. (2019). A comprehensive review of Data Mining techniques in smart agriculture. Engineering in Agriculture, Environment and Food, 12(4), 511-525.
Peña-Ayala, A. (2014). Educational data mining: A survey and a data mining-based analysis of recent works. Expert systems with applications, 41(4), 1432-1462.
Rodrigues, M. W., Isotani, S., & Zarate, L. E. (2018). Educational Data Mining: A review of evaluation process in the e-learning. Telematics and Informatics, 35(6), 1701-1717.
Thalheim, B. (2013). Entity-relationship modeling: foundations of database technology. Springer Science & Business Media.
W. A. N. G. Zhao-Yi, H. U. A. N. G. Zheng-De, Y. A. N. G. Ping, R. E. N. Ting, and L. I. Xin-Hui, “Regularity of wind-dispelling medication prescribed by li dong-yuan: a data mining technology-based study,” Digital Chinese Medicine, vol. 3, no. 1, pp. 20–33, 2020.
L. Ogiela, M. R. Ogiela, and H. Ko, “Intelligent data management and security in cloud computing,” Sensors, vol. 20, no. 12, p. 3458, 2020.
S. Fatima, K. C. Desouza, J. S. Denford, and G. S. Dawson, “What explains governments interest in artificial intelligence? A signaling theory approach,” Economic Analysis and Policy , vol. 71, no. 4, pp. 238–254, 2021.
Y. Han and D. Yong, “A hybrid intelligent model for the assessment of critical success factors in high-risk emergency system,” Journal of Ambient Intelligence and Humanized Computing, vol. 9, no. 6, pp. 1–21, 2018.
S. J. Pan and Q. Yang. A survey on transfer learning. IEEE Trans. Knowledge and Data Engineering, 22:1345–1359, 2010.
L. De Raedt, T. Guns, and S. Nijssen. Constraint programming for data mining and machine learning. In Proc. 2010 AAAI Conf. Artificial Intelligence (AAAI’10), pp. 1671– 1675, Atlanta, GA, July 2010.
C. Vilasi, “Intelligence, globalization, complex and multilevel society,” Open Journal of Political Science, vol. 8, no. 1, pp. 47–56, 2018.
T. Arisetty and S. Manikandaswamy, “Intelligent driver assitance for vehichle safety,” International ournal of computational intelligence research, vol. 13, no. 9, pp. 2189–2195, 2017.
M. Ahmadi, S. Jafarzadeh-Ghoushchi, R. Taghizadeh, and A. Sharifi, “Presentation of a new hybrid approach for forecasting economic growth using artificial intelligence approaches,” Neural Computing & Applications, vol. 31, no. 12, pp. 8661–8680, 2019.
Lapatinas and A. Litina, “Intelligence and economic sophistication,” Empirical Economics, vol. 57, no. 5, pp. 1731–1750, 2019.
G. A. Montes and B. Goertzel, “Distributed, decentralized, and democratized artificial intelligence,” Technological Forecasting and Social Change, vol. 141, pp. 354–358, 2019.
Y. Sun, J. Tang, J. Han, M. Gupta, and B. Zhao. Community evolution detection in dynamic heterogeneous information networks. In Proc. 2010 KDD Workshop Mining and Learning with Graphs (MLG’10), Washington, DC, July 2010.
Uppal, A. ., Naruka, M. S. ., & Tewari, G. . (2023). Image Processing based Plant Disease Detection and Classification . International Journal on Recent and Innovation Trends in Computing and Communication, 11(1s), 52–56. https://doi.org/10.17762/ijritcc.v11i1s.5993
Ms. Madhuri Zambre. (2012). Performance Analysis of Positive Lift LUO Converter . International Journal of New Practices in Management and Engineering, 1(01), 09 - 14. Retrieved from http://ijnpme.org/index.php/IJNPME/article/view/3
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2023 Rajashekhar Gouda C. Patil, Praising Linijah N. L., Aniruddha Bodhankar, B. Anniprincy, Manuel R. Tanpoco, Dhanashree Toradmalle
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
All papers should be submitted electronically. All submitted manuscripts must be original work that is not under submission at another journal or under consideration for publication in another form, such as a monograph or chapter of a book. Authors of submitted papers are obligated not to submit their paper for publication elsewhere until an editorial decision is rendered on their submission. Further, authors of accepted papers are prohibited from publishing the results in other publications that appear before the paper is published in the Journal unless they receive approval for doing so from the Editor-In-Chief.
IJISAE open access articles are licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. This license lets the audience to give appropriate credit, provide a link to the license, and indicate if changes were made and if they remix, transform, or build upon the material, they must distribute contributions under the same license as the original.