Performance Analysis of a Parameter Selection Model on a Big Data Set

Authors

  • Adusumalli Balaji, Ch. Indira Priyadarsini, Eluri Nageswara Rao, Kalluri Siva Krishna, Nangineni Srikanth, K Ravi kiran Yasaswi, Popuri Srinivasarao

Keywords:

soft set theory, best optimized attribute set, best- first search algorithm, correlation-based feature selection, rough set theory

Abstract

Among the numerous phases within the data analysis process, the meticulous choice of parameters or attributes stands out as a pivotal stage. An erroneous choice in this regard can lead to suboptimal decisions. In the process of decision analysis, it proves advantageous for the decision-maker to have the capability to select and employ the most suitable model for identifying the optimally configured attribute set. In recent times, a substantial number of data scientists across various application domains have been attracted to the exploration of the advantages and disadvantages associated with big data. One prominent challenge arises when there is no appropriate model available to serve as a guiding framework, making the evaluation of extensive and diverse data in a big data environment particularly daunting for data scientists. Consequently, this study proposes an alternative parameterization approach capable of yielding an optimal attribute set while minimizing the associated learning, utilization, and maintenance costs. This model is constructed by integrating two complementary models with the soft set theory, best-first search algorithm, correlation-based feature selection, and rough set theory, all working synergistically as a parameter selection methodology. The proposed model has notably emerged as a strong contender in experiments focused on processing vast datasets.

Downloads

Download data is not yet available.

References

D. Kumar, R. Rengasamy, Parameterization reduction using soft set theory for better decision making, Pattern Recognition, Informatics and Mobile Engineering, 2013, pp. 3–5.

N. Anitha, G. Keerthika, A framework for medical image classification using soft set, Curr. Trends Eng. Technol. (2014).

M. Mohamad, A. Selamat, Analysis on hybrid dominance-based rough set parameterization using private financial initiative unitary charges data, LNAI Asian Conference on Intelligent Information and Database Systems, Springer, Cham, 2018, pp. 318–328.

M. Mohamad, A. Selamat, A two-tier hybrid parameterization framework for effective data classification, New Trends in Intelligent Software Methodologies, Tools and Techniques, Vol. 303, IOS Press, 2018, pp. 321–331.

Y. Liu, Y. Zhang, J. Ling, Z. Liu, Secure and fine-grained access control on e-healthcare records in mobile cloud computing, Future Gener. Comput. Syst. 78 (2018) 1020–1026.

S.B.A. Kamaruddin, N.A.M. Ghani, N.M. Ramli, Best forecasting models for private financial initiative unitary charges data of east coast

and southern regions in peninsular Malaysia, Int. J. Econ. Stat. 2 (2014) 119–127.

A. Ahmad, M. Khan, A. Paul, S. Din, M.M. Rathore, G. Jeon, G.S. Choi, Toward modeling and optimization of features selection in Big Data based social Internet of Things, Future Gener. Comput. Syst. 82 (2017) 715–726.

P. Sawicki, J. Żak, The application of dominance-based rough sets theory for the evaluation of transportation systems, Proc. Soc. Behav. Sci. 111 (2014) 1238–1248.

M. Cecconello, S. Conroy, D. Marocco, F. Moro, B. Esposito, Neural network implementation for ITER neutron emissivity profile recognition, Fusion Eng. Des. 123 (2016) 637–640.

L. Wang, Y. Wang, Q. Chang, Feature selection methods for big data bioinformatics: A survey from the search perspective, Methods 111 (2016) 21–31.

M.I. Pramanik, R.Y. Lau, H. Demirkan, M.A.K. Azad, Smart health: Big data enabled health paradigm within smart cities, Expert Syst. Appl. 87 (2017) 370–383.

K.Y. Shen, S.K. Hu, G.H. Tzeng, Financial modeling and improvement planning for the life insurance industry by using a rough knowledge based hybrid MCDM model, Inform. Sci. 375 (2017) 296–313.

M. Esposito, A. Minutolo, R. Megna, M. Forastiere, M. Magliulo, G. De Pietro, A smart mobile, self-configuring, context-aware architecture for personal health monitoring, Eng. Appl. Artif. Intell. 67 (2018) 136–156.

X. Ma, Q. Liu, J. Zhan, A survey of decision making methods based on certain hybrid soft set models, Artif. Intell. Rev. 47 (2017) 507–530.

N. Allias, M.N. Megat, N. Megat, M.N. Ismail, A hybrid gini PSO- SVM feature selection based on Taguchi method : An evaluation on email filtering, Proceedings of the 8th International Conference on Ubiquitous Information Management and Communication, ACM, 2014, pp. 55–59, http://dx.doi.org/10.1145/2557977.2557999.

Z. Masetic, A. Subasi, Congestive heart failure detection using random forest classifier, Comput. Methods Programs Biomed. 130 (2016) 54–64.

B. Ait Hammou, A. Ait Lahcen, S. Mouline, APRA: An approximate parallel recommendation algorithm for Big Data, Knowl.-Based Syst. 157 (2018) 10–19.

M. Mohamad, A. Selamat, A new soft rough set parameter reduction method for an effective decision-making, New Trends in Intelligent Software Methodologies, Tools and Techniques, Vol. 297, IOS Press, 2017, pp. 691–704.

A. Hassani, S.A. Gahnouchi, A framework for business process data management based on big data approach, Procedia Comput. Sci. (2017).

Y.-C. Ko, H. Fujita, An evidential analytics for buried information in big data samples: Case study of semiconductor manufacturing, Inform. Sci. 486 (2019) 190–203, http://dx.doi.org/10.1016/j.ins.2019.01.079, http://www.sciencedirect.com/science/article/pii/S00200255193005X.

J. Luo, H. Fujita, Y. Yao, K. Qin, On modeling similarity and three- way decision under incomplete information in rough set theory, Knowledge-Based Syst. (2019) 105251, http://dx.doi.org/10.1016/j.knosys.2019.105251, http://www.sciencedirect.com/science/article/pii/S0950705119305635

H. Fujita, A. Gaeta, V. Loia, F. Orciuoli, Hypotheses analysis and assessment in counter-terrorism activities: a method based on OWA and fuzzy prob- abilistic rough sets, IEEE Trans. Fuzzy Syst. (2019) 1, http://dx.doi.org/10. 1109/TFUZZ.2019.2955047. H. Fujita, A. Gaeta,

V. Loia, F. Orciuoli, Improving awareness in early stages of security analysis: A zone partition method based on GrC, Appl. Intell. 49 (2018) 1063–1077.

H. Fujita, A. Gaeta, V. Loia, F. Orciuoli, Resilience analysis of critical infrastructures: A cognitive approach based on granular computing, IEEE Trans. Cybern. 49 (5) (2019) 1835–1848,

http://dx.doi.org/10.1109/TCYB. 2018.2815178.

J. Akoka, I. Comyn-Wattiau, N. Laoufi, Research on big data – A systematic mapping study, Comput. Stand. Interfaces 54 (2017) 105–115.

L. Koc, T.a. Mazzuchi, S. Sarkani, A network intrusion detection system based on a Hidden Naïve Bayes multiclass classifier, Expert Syst. Appl. 39 (18) (2012) 13492–13500.

S. Chebrolua, S.G. Sanjeevi, Attribute reduction in decision-theoretic rough set model using particle swarm optimization with the threshold

param- eters determined using LMS training rule, Knowl.-Based Syst. 57 (2015) 527–536.

O.S. Soliman, A. Rassem, Correlation based feature selection using quantum bio inspired estimation of distribution algorithm, Lecture Notes in Com- puter Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), LNAI, vol. 7694, 2012, pp. 318–329.

N.F. Abubacker, A. Azman, S. Doraisamy, Correlation-based feature selec- tion for association rule mining in semantic annotation of mammographic, Pattern Recognit. Lett. 32 (2011) 482–493.

S. Chormunge, S. Jena, Correlation based feature selection with clustering for high dimensional data, J. Electr. Syst. Inf. Technol. (2018) 4–11.

D. Molodtsov, Soft set theory-first results, Comput. Math. Appl. 37 (4) (1999) 19–31.

J. Chai, E.W.T. Ngai, J.N.K. Liu, Dynamic tolerant skyline operation for decision making, Expert Syst. Appl. 41 (15) (2014) 6890–6903.

Y. Liu, K. Qin, L. Martínez, Improving decision making approaches based on fuzzy soft sets and rough soft sets, Appl. Soft Comput. J. 65 (2018) 320–332.

X. Ma, N. Sulaiman, H. Qin, T. Herawan, J.M. Zain, A new efficient normal parameter reduction algorithm of soft sets, Comput. Math. Appl. 62 (2) (2011) 588–598.

F. Feng, X. Liu, V. Leoreanu-Fotea, Y.B. Jun, Soft sets and soft rough sets, Inform. Sci. 181 (6) (2011) 1125–1137.

M. Irfan Ali, A note on soft sets, rough soft sets and fuzzy soft sets, Appl. Soft Comput. J. 11 (4) (2011) 3329–3332.

M. Mohamad, A. Selamat, Recent study on the application of hybrid rough set and soft set theories in decision analysis process, Lecture Notes in Artificial Intelligent, LNAI, 9799, 2016, pp. 713–724.

M. Mohamad, A. Selamat, A new hybrid rough set and soft set parameter reduction method for spam e-mail classification task, Lecture Notes in Artificial Intelligent, LNAI, 9806, 2016, pp. 18–30.

Z. Pawlak, Rough set approach to knowledge-based decision support, European J. Oper. Res. 99 (1997) 48–57.

Local rough set: A solution to rough data analysis in big data, Internat.

J. Approx. Reason. 97 (2018) 38–63, http://www.sciencedirect.com/science/article/pii/S0888613X1730486.

A. Oussous, F.Z. Benjelloun, A. Ait Lahcen, S. Belfkih, Big data technologies: A survey, J. King Saud Univ. Comput. Inf. Sci. 30 (2018) 431–448.

T.K. Sheeja, A.S. Kuriakose, A novel feature selection method using fuzzy rough sets, Comput. Ind. 97 (2018) 111–121.

J. Liu, Y. Lin, Y. Li, W. Weng, S. Wu, Online multi-label streaming feature selection based on neighborhood rough set, Comput. Ind. 84 (2018) 273–287.

B. Huang, Y.L. Zhuang, H.X. Li, D.K. Wei, A dominance intuitionistic fuzzy- rough set approach and its applications, Appl. Math. Model. 37 (12–13) (2013) 7128–7141.

W.S. Du, B.Q. Hu, Dominance-based rough set approach to incomplete ordered information systems, Inform. Sci. 346–347 (2016) 106–129.

S. Greco, B. Matarazzo, R. Slowi, Algebra and topology for dominance-based rough set approach, Z.W. Ras, L.-S. Tsay (Eds.), Advances in Intelligent Information Systems, Springer, 2010, pp. 43– 78.

M.I. Ali, B. Davvaz, M. Shabir, Some properties of generalized rough sets, Inform. Sci. 224 (2013) 170–179.

A. Grama, A. Gupta, G. Karypis, V. Kumar, Principles of parallel algorithm design, Introduction to Parallel Computing, second ed., Addison Wesley, Harlow, 2003.

H. Li, D. Li, Y. Zhai, S. Wang, J. Zhang, A novel attribute reduction approach for multi-label data based on rough set theory, Inform. Sci. 367–368 (2016) 827–847.

I. Triguero, D. Peralta, J. Bacardit, S. García, F. Herrera, MRPR: A MapReduce solution for prototype reduction in big data classification, Neurocomputing 150 (2015) 331–345.

A. Arnaiz-Gonzalez, J.F. Diez-Pastor, J.J. Rodriguez, C. Garcia- Osorio, In- stance selection of linear complexity for big data, Knowl.- Based Syst. 107 (2016) 83–95.

S.K. Pal, S.K. Meher, S. Dutta, Class-dependent rough-fuzzy granular space, dispersion index and classification, Pattern Recognit. 45 (7) (2012) 2690–2707.

G.R. Teixeira de Lima, S. Stephany, A new classification approach for detecting severe weather patterns, Comput. Geosci. 57 (2013) 158–

F. Wang, Q. Wang, F. Nie, W. Yu, R. Wang, Efficient tree classifiers for large scale datasets, Neurocomputing 284 (2018) 70–79.

D. García-Gil, S. Ramírez-Gallego, S. García, F. Herrera, Principal compo- nents analysis random discretization ensemble for big data, Knowl.-Based Syst. 150 (2018) 166–174.

J. Maillo, R. Sergio, I. Triguero, F. Herrera, kNN-IS: An Iterative Spark-based design of the k-Nearest Neighbors classifier for big data, Knowl.-Based Syst. 117 (2017) 3–15.

Mohamad, Masurah, et al. "An analysis on new hybrid parameter selection model performance over big data set" Knowledge-Based Systems 192 (2020): 105441.

Srinivasarao, Popuri, and Aravapalli Rama Satish. "A Novel Hybrid Optimization Algorithm for Materialized View Selection from Data Warehouse Environments" Computer Systems Science & Engineering 47.2 (2023).

Srinivasarao, Popuri, and Aravapalli Rama Satish. "A Hybrid Metaheuristic Framework for Materialized View Selection in Data Warehouse Environments" International Journal of Cooperative Information Systems (2023): 2350021.

Downloads

Published

24.03.2024

How to Cite

Ch. Indira Priyadarsini, Eluri Nageswara Rao, Kalluri Siva Krishna, Nangineni Srikanth, K Ravi kiran Yasaswi, Popuri Srinivasarao, A. B. . (2024). Performance Analysis of a Parameter Selection Model on a Big Data Set. International Journal of Intelligent Systems and Applications in Engineering, 12(3), 2511–2526. Retrieved from https://ijisae.org/index.php/IJISAE/article/view/5723

Issue

Section

Research Article