Hadoop Distributed File System Write Operations
Keywords:
Hadoop Distributed File System (HDFS),NameNode , DataNode, Replica, Rackawareness, Data Packet, Data Packet Transfer Time, Pipeline, Fully Connected Digrapgh Network Topology, A* algorithm.Abstract
Hadoop is an open-source version of the MapReduce Framework for distributed processing. A Hadoop cluster possesses the capacity to manage substantial volumes of data. Hadoop utilizes the Hadoop Distributed File System, also known as HDFS, to manage large amounts of data. The client will transfer data to the DataNodes by retrieving block information from the NameNode. The pipeline configuration will connect the DataNodes that store the blocks. If a DataNode or network fails during the data writing process, the pipeline will remove the failed DataNode. The pipeline will add the new DataNode based on the existing DataNodes in the cluster. If there is a scarcity of spare nodes in the cluster, customers may encounter an abnormally high frequency of pipeline failures due to the inability to locate additional DataNodes or replacements. In the event of a network failure, the data packet is unable to reach the target DataNode due to their interconnected pipeline structure. Interconnecting each DataNode with every other DataNode ensures that multiple pathways are available through other DataNodes, thereby preventing network failure. The copy operation will take longer due to pipeline connectivity. On the other hand, a direct connection between a DataNode and all other DataNodes significantly reduces the time required, as the datapacket doesn't have to traverse through all other DataNodes to reach the final DataNode. This paper presents the utilization of the A* algorithm to enhance the performance of write operations in the Hadoop Distributed File System.
Downloads
References
Apache Hadoop. Available at Hadoop Apache.
Deepak Vohra, Practical Hadoop Ecosystem:A Definitive Guide to Hadoop-Related Frameworks and Tools, Appress; 1st ed. edition ,October 1, 2016
Tom White, "Hadoop:The Definitive Guide", Storage and Analysis at Internet Scale, Second ed., Yahoo Press, 2010
J.L.Mott, A.Kandel, Mott & Kandel, Discrete Mathematics For Computer Scientists And Mathematicians (English) , 2 Ed, Pearson India, (2015)
Hadoop Distributed File System with Cache system – a paradigm for performnace improvement by Archana Kakade and Dr. SuhasRaut, International journal of scientific research and management (IJSRM), Vol.2,Issue.1: Pp,1781-1784 /Aug. 2014.
KonstantinShvachko, HairongKuang, Sanjay Radia, Robert Chansler , "The Hadoop Distributed File System". Vol.1, No.1, pp.1-10, 2010.
Debajyoti Mukhopadhyay, Chetan Agrawal, Devesh Maru, Pooja Yedale, Pranav Gadekar, Addressing NameNode Scalability Issue in Hadoop Distributed File System using Cache Approach.Vol.1, pp.1-6, 2014
Feng Wang, Jie Qiu, Jie Yang, Bo Dong, Xinhui Li,Ying Li, " Hadoop High Availability through Metadata Replication", IBM China Research Laboratory, ACM, pp 37-44 ,2009.
Ellis Horowitz and Sartaj Sahni, Sanguthevar Rajasekaran, Fundamentals of Computer Algorithms, Galgotia Publications, 2010.
B. Purnachandra Rao, Dr. N. Nagamalleswara Rao, HDFS Write Operation Using Fully Connected Digraph DataNode Network Topology, International Journal of Applied Engineering Research ISSN 0973- 4562 Volume 12, Number 16 (2017) pp. 6076-6090, © Research India Publications.
A Systematic Literature Review of A* Pathfinding,
https://www.sciencedirect.com/science/article/pii/S187
John Ousterhout, Parag Agrawal, David Erickson, Christos Kozyrakis, Jacob Leverich, David Mazières, Subhasish Mitra, Aravind Narayanan, Guru Parulkar, Mendel Rosenblum, Stephen M. Rumble, Eric Stratmann, and Ryan Stutsman “The Case for RAMClouds: Scalable High-Performance Storage Entirely in DRAM” Department of Computer Science Stanford University, Vol. 43, No. 4, pp. 92-105, December 2009
Hong Zhang1, Liqiang Wang1, and Hai Huang2, "SMARTH: Enabling Multi-pipeline Data Transfer in HDFS", in: Proc of. Parallel Processing (ICPP), 2014 43rd International Conference on, HDFS DataTransfer
J. Shafer and S Rixner (2010), "The Hadoop distributed file system: balancing portability and performance”, In 2010 IEEE International Symposium on Performance Analysis of System andSoftware (ISPASS2010), White Plains, NY, Pp.122-133, March 2010.
SAM R. ALAPATI , Expert Hadoop Administration, Managing, Tuning and Securing Spark, YARN , and HDFS, Addision Wesley Data Analytics series, 2017.
Downloads
Published
How to Cite
Issue
Section
License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
All papers should be submitted electronically. All submitted manuscripts must be original work that is not under submission at another journal or under consideration for publication in another form, such as a monograph or chapter of a book. Authors of submitted papers are obligated not to submit their paper for publication elsewhere until an editorial decision is rendered on their submission. Further, authors of accepted papers are prohibited from publishing the results in other publications that appear before the paper is published in the Journal unless they receive approval for doing so from the Editor-In-Chief.
IJISAE open access articles are licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. This license lets the audience to give appropriate credit, provide a link to the license, and indicate if changes were made and if they remix, transform, or build upon the material, they must distribute contributions under the same license as the original.