34 | 6.1. YARN Why YARN?

Never stop talking " STOP the Gaza Genocide "

رقم الدرس : 34
00:02:48
6.1. YARN Why YARN?
تشغيل

دروس الكورس

تقييمات الطلاب

( 5 من 5 )

١ تقييمات

5 نجوم

100%

4 نجوم

0%

3 نجوم

0%

نجمتين

0%

نجمة

0%

Y

Youtube

02-07-2024

فيديو شرح 6.1. YARN Why YARN? ضمن كورس Apache Hadoop شرح قناة CloudxLab Official، الفديو رقم 34 مجانى معتمد اونلاين

In this session, we are going to discuss YARN - Yet Another Resource Negotiator.
YARN is a resource manager which keeps track of various resources such as memory and CPU of machines in the network. It also runs applications on the machines and keeps track of what is running where.
Before jumping into YARN architecture, let try to understand with an example why we need distributed computing
Let us say we have a computer with 1 GHz processor and 1 GB RAM. It takes 20 milliseconds to read the profile pic from disk and then 5 more mill seconds to resize it.
How much time would this computer take to resize a million profile pics?
Can we do two things in parallel when dealing with so many pics?
Yes because reading from disk involves mainly the disk and resizing mainly involves CPU and RAM.
So, reading and resizing can be done in parallel as shown in the diagram. In the diagram, time is increasing from left to right.
You can see that while pic1 is being resized, pic2 is being read from the disk.
For three pics, it takes 20 times 3 plus 5 milli seconds for resizing. Not 25 times 3. So, it took 65ms not 75ms.
So, it is only the disk read time that matters we can completely ignore the last 5ms on large scale. For one million pics it would be 1 million times 20 milliseconds which is approximately 5.5 hours
5.5 hours is not good enough? The next questions is how can we make it faster?
If we use a computer which has four cores or processors, can this process finish in less than 5.5 hours?
No, because it is not the CPU which is causing the delay. The main time is being consumed in disk reads. If we make disk reads faster, the process will become faster. Disk reads can be made faster by using Solid State Drives and by using many disk drives.
This Big Data Tutorial will help you learn HDFS, ZooKeeper, Hive, HBase, NoSQL, Oozie, Flume, Sqoop, Spark, Spark RDD, Spark Streaming, Kafka, SparkR, SparkSQL, MLlib, and GraphX from scratch. Everything in this course is explained with the relevant example thus you will actually know how to implement the topics that you will learn in this course.
Let us know in the comments below if you find it helpful.
In order to claim the certificate from E&ICT Academy, IIT Roorkee, visit https://bit.ly/cxlyoutube
________
Website https://www.cloudxlab.com
Facebook https://www.facebook.com/cloudxlab
Instagram https://www.instagram.com/cloudxlab
Twitter http://www.twitter.com/cloudxlab

كورسات درسها الطلاب

Data Warehouse (DWH) ٦ محاضرين
ETL Process ٣ محاضرين
التنقيب عن البيانات ١٠ محاضرين
Spark ٤ محاضرين