Propositional Aspects of Big Data Tools: A Comprehensive Guide to Apache Spark

Authors

  • Jyoti Chaudhary Research Scholar, Department of Computer Science, Banasthali Vidyapith, Rajasthan
  • Vaibhav Vyas Associate Professor, Department of Computer Science, Banasthali Vidyapith, Rajasthan

Keywords:

Apache Spark, Hadoop, Big data, Hive, Pig

Abstract

The industry market has been impacted by big data analysis. Large and diverse datasets are significantly impacted, revealing hidden patterns and other insights. Apache Spark is one of the most admired big data tools to process and execute massive amount. A consolidated large data analytics engine that offers independent data parallelism is Apache Spark. In this paper, an intensive examination has been conveyed on big data analytical technique. This examines a technical review on Apache Spark's in-memory computing capabilities, which make it noticeably faster than other equivalent frameworks for large data analytics. Moreover, Spark has outstanding batch processing and stream processing capability. Also, it talks about Apache Spark's multithreading and concurrency features. The central focus is the Apache Spark architecture, its evolution and ecosystem, application cases, Spark features, and need of Apache Spark for applications with a comparison with Apache Hadoop.

Downloads

Download data is not yet available.

References

Acharjya, D. P., & Ahmed, K. (2016). A survey on big data analytics: challenges, open research issues and tools. International Journal of Advanced Computer Science and Applications, 7(2), 511-518.

Zaharia, M., Xin, R. S., Wendell, P., Das, T., Armbrust, M., Dave, A., ... & Stoica, I. (2016). Apache spark: a unified engine for big data processing. Communications of the ACM, 59(11), 56-65.

Iqbal, M. H., & Soomro, T. R. (2015). Big data analysis: Apache storm perspective. International journal of computer trends and technology, 19(1), 9-14.

S. Sarraf and M. Ostadhashem, “Big data application in functional magnetic resonance imaging using apache spark,” in 2016 Future Technologies Conference (FTC), Dec 2016, pp. 281–284.

Gopalani, S., & Arora, R. (2015). Comparing apache spark and map reduce with performance analysis using k-means. International journal of computer applications, 113(1).

García-Gil, D., Ramírez-Gallego, S., García, S., & Herrera, F. (2017). A comparison on scalability for batch big data processing on Apache Spark and Apache Flink. Big Data Analytics, 2(1), 1-11.

Shyam, R., HB, B. G., Kumar, S., Poornachandran, P., & Soman, K. P. (2015). Apache spark a big data analytics platform for smart grid. Procedia Technology, 21, 171-178.

Akil, B., Zhou, Y., &Röhm, U. (2017, December). On the usability of Hadoop MapReduce, Apache Spark & Apache flink for data science. In 2017 IEEE International Conference on Big Data (Big Data) (pp. 303-310). IEEE.

http://spark.apache.org/

Jonnalagadda, V. S., Srikanth, P., Thumati, K., Nallamala, S. H., &Dist, K. (2016). A review study of apache spark in big data processing. International Journal of Computer Science Trends and Technology (IJCST), 4(3), 93-98.

Karau, H., Konwinski, A., Wendell, P., & Zaharia, M. (2015). Learning spark: lightning-fast big data analysis. " O'Reilly Media, Inc."

Ramírez-Gallego, S., Mouriño-Talín, H., Martinez-Rego, D., Bolón-Canedo, V., Benítez, J. M., Alonso-Betanzos, A., & Herrera, F. (2017). An information theory-based feature selection framework for big data under apache spark. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 48(9), 1441-1453.

Shaikh, E., Mohiuddin, I., Alufaisan, Y., &Nahvi, I. (2019, November). Apache spark: A big data processing engine. In 2019 2nd IEEE Middle East and North Africa COMMunications Conference (MENACOMM) (pp. 1-6). IEEE.

Ahmed, D. N., Aftab, A., &Nezami, M. M. (2020). A technological survey on apache spark and hadoop technologies. IJSTR, 9(01), 3100-3109.

Han, Z., & Zhang, Y. (2015, December). Spark: A big data processing platform based on memory computing. In 2015 Seventh International Symposium on Parallel Architectures, Algorithms and Programming (PAAP) (pp. 172-176). IEEE.

Jonnalagadda, V. S., Srikanth, P., Thumati, K., Nallamala, S. H., &Dist, K. (2016). A review study of apache spark in big data processing. International Journal of Computer Science Trends and Technology (IJCST), 4(3), 93-98.

Anuraag Garg, “Apache spark architecture,” Website, 2023. [Online]. Available: https://intellipaat.com/blog/tutorial/spark-tutorial/spark-architecture/.

Puspalatha, N., & Sudheer, P. (2015). Data processing in big data by using Hive interface. International Journal of advance research in computer science and management studies, 3(4).

Hussain, T., Sanga, A., & Mongia, S. (2019, October). Big data hadoop tools and technologies: A review. In Proceedings of International Conference on Advancements in Computing & Management (ICACM).

Shoro, A. G., & Soomro, T. R. (2015). Big data analysis: Apache spark perspective. Global Journal of Computer Science and Technology, 15(C1), 7-14.

Singh, A., Khamparia, A., & Luhach, A. K. (2019, June). Performance comparison of apachehadoop and apache spark. In Proceedings of the Third International Conference on Advanced Informatics for Computing Research (pp. 1-5).

Chaudhary, J., Vyas, V., & Jha, C. K. (2022). Qualitative Analysis of SQL and NoSQL Database with an Emphasis on Performance. In IOT with Smart Systems: Proceedings of ICTIS 2022, Volume 2 (pp. 155-165). Singapore: Springer Nature Singapore.

KE, K., Balaji, A., & Sajith, A. (2018). Performance comparison of apache spark and Hadoop based large scale content-based recommender system. In Intelligent Systems Technologies and Applications (pp. 66-73). Springer International Publishing.

Downloads

Published

12.01.2024

How to Cite

Chaudhary, J. ., & Vyas, V. . (2024). Propositional Aspects of Big Data Tools: A Comprehensive Guide to Apache Spark. International Journal of Intelligent Systems and Applications in Engineering, 12(12s), 631 –. Retrieved from https://ijisae.org/index.php/IJISAE/article/view/4547

Issue

Section

Research Article