Hadoop Distributed File System Write Operations

Renukadevi Chuppala

Authors

Renukadevi Chuppala, B. Purnachandra Rao

Keywords:

Hadoop Distributed File System (HDFS),NameNode , DataNode, Replica, Rackawareness, Data Packet, Data Packet Transfer Time, Pipeline, Fully Connected Digrapgh Network Topology, A* algorithm.

Abstract

Hadoop is an open-source version of the MapReduce Framework for distributed processing. A Hadoop cluster possesses the capacity to manage substantial volumes of data. Hadoop utilizes the Hadoop Distributed File System, also known as HDFS, to manage large amounts of data. The client will transfer data to the DataNodes by retrieving block information from the NameNode. The pipeline configuration will connect the DataNodes that store the blocks. If a DataNode or network fails during the data writing process, the pipeline will remove the failed DataNode. The pipeline will add the new DataNode based on the existing DataNodes in the cluster. If there is a scarcity of spare nodes in the cluster, customers may encounter an abnormally high frequency of pipeline failures due to the inability to locate additional DataNodes or replacements. In the event of a network failure, the data packet is unable to reach the target DataNode due to their interconnected pipeline structure. Interconnecting each DataNode with every other DataNode ensures that multiple pathways are available through other DataNodes, thereby preventing network failure. The copy operation will take longer due to pipeline connectivity. On the other hand, a direct connection between a DataNode and all other DataNodes significantly reduces the time required, as the datapacket doesn't have to traverse through all other DataNodes to reach the final DataNode. This paper presents the utilization of the A* algorithm to enhance the performance of write operations in the Hadoop Distributed File System.

Downloads

Download data is not yet available.

References

Apache Hadoop. Available at Hadoop Apache.

Deepak Vohra, Practical Hadoop Ecosystem:A Definitive Guide to Hadoop-Related Frameworks and Tools, Appress; 1st ed. edition ,October 1, 2016

Tom White, "Hadoop:The Definitive Guide", Storage and Analysis at Internet Scale, Second ed., Yahoo Press, 2010

J.L.Mott, A.Kandel, Mott & Kandel, Discrete Mathematics For Computer Scientists And Mathematicians (English) , 2 Ed, Pearson India, (2015)

Hadoop Distributed File System with Cache system – a paradigm for performnace improvement by Archana Kakade and Dr. SuhasRaut, International journal of scientific research and management (IJSRM), Vol.2,Issue.1: Pp,1781-1784 /Aug. 2014.

KonstantinShvachko, HairongKuang, Sanjay Radia, Robert Chansler , "The Hadoop Distributed File System". Vol.1, No.1, pp.1-10, 2010.

Debajyoti Mukhopadhyay, Chetan Agrawal, Devesh Maru, Pooja Yedale, Pranav Gadekar, Addressing NameNode Scalability Issue in Hadoop Distributed File System using Cache Approach.Vol.1, pp.1-6, 2014

Feng Wang, Jie Qiu, Jie Yang, Bo Dong, Xinhui Li,Ying Li, " Hadoop High Availability through Metadata Replication", IBM China Research Laboratory, ACM, pp 37-44 ,2009.

Ellis Horowitz and Sartaj Sahni, Sanguthevar Rajasekaran, Fundamentals of Computer Algorithms, Galgotia Publications, 2010.

B. Purnachandra Rao, Dr. N. Nagamalleswara Rao, HDFS Write Operation Using Fully Connected Digraph DataNode Network Topology, International Journal of Applied Engineering Research ISSN 0973- 4562 Volume 12, Number 16 (2017) pp. 6076-6090, © Research India Publications.

http://www.ripublication.com

A Systematic Literature Review of A* Pathfinding,

https://www.sciencedirect.com/science/article/pii/S187

John Ousterhout, Parag Agrawal, David Erickson, Christos Kozyrakis, Jacob Leverich, David Mazières, Subhasish Mitra, Aravind Narayanan, Guru Parulkar, Mendel Rosenblum, Stephen M. Rumble, Eric Stratmann, and Ryan Stutsman “The Case for RAMClouds: Scalable High-Performance Storage Entirely in DRAM” Department of Computer Science Stanford University, Vol. 43, No. 4, pp. 92-105, December 2009

Hong Zhang1, Liqiang Wang1, and Hai Huang2, "SMARTH: Enabling Multi-pipeline Data Transfer in HDFS", in: Proc of. Parallel Processing (ICPP), 2014 43rd International Conference on, HDFS DataTransfer

J. Shafer and S Rixner (2010), "The Hadoop distributed file system: balancing portability and performance”, In 2010 IEEE International Symposium on Performance Analysis of System andSoftware (ISPASS2010), White Plains, NY, Pp.122-133, March 2010.

SAM R. ALAPATI , Expert Hadoop Administration, Managing, Tuning and Securing Spark, YARN , and HDFS, Addision Wesley Data Analytics series, 2017.

Hadoop Distributed File System Write Operations

Authors

Keywords:

Abstract

Downloads

References

Downloads

Published

How to Cite

Issue

Section

License

Announcements

Information for Authors

ijisae

Information

Indexed By

Hadoop Distributed File System Write Operations

Authors

Keywords:

Abstract

Downloads

References

Downloads

Published

How to Cite

Issue

Section

License

Announcements

Information for Authors

Like, Subscribe and Share This Video

ijisae

Information

Indexed By