Apache Hadoop Yarn for fun!

Technology terms are easy to understand generally (I lied) but not all the times. I have noticed the facial expression (you see that they are either trying hard or making funny expressions) of the receivers when I explained some technologies to a group of technologists and non-technologists. The moment I notice those expressions, I have related them with our real life scenarios to help them to understand the topics. The good part is that they understood and able to relate with the technology concepts, but the bad is after they left they forgot the technology and remembered only the story.. Hey, Don’t blame me.

Now a days, I have started capturing all in writing so people can go back and relate them again and I hope they take time to remember the technology too. Will you?.

Lot of people asked me about Yarn and even after explaining multiple times, it became hard for them to remember after few weeks/months. So how about having some fun and learn?. Let us try..

You are a client and you need consultants (CPU core / memory) . You called the Vendor’s resource manager (yarn resource manager) who is the MASTER . The Vendor resource manager knows all the details about how many managers he has, how many people report to each manager, what type of skills they have, where are they located etc.  You understand without him, nothing can be done. What happens if he goes on vacation?. The company is dead unless they have a back-up. So remember you have to have yarn resource manager configured to run in High Availability (HA) mode.

Remember you don’t want your resource manager to do all work for you since you will be overwhelming him (the previous version of Hadoop did that and realized soon; then created new architecture YARN with a resource manager role which does specific jobs and not everything).

Let us walk thru the life cycle of the process.

YARN and COMMON LIFE

  1. You reached out to Vendor resource manager with your requirement  consultants and the request (Job)
  2. The resource manager accepts the request and send the Job request to his reporting manager (Yarn Application Master – Single Container) to deal with the details and nature of the job etc.
  3. The reporting manager (Yarn Application Master) acknowledge the (registers with resource manager), looks at the Job requirement of the clients, qualifies it and understand that you need  5 consultants (CPU cores) and bill rate of $150 each(Memory)
  4. Reporting manager (Yarn Application Master), contacts (negotiates with) the resource manager (Yarn resource manager) asking for 5 consultants (CPU cores) and bill rate of $150 each(Memory). (remember centralized resource manager is the one who knows whether the resources are available or not since we do not want different staffing managers committing the same resource to multiple clients)
  5. Resource manager (Yarn resource manager), checks the availability of the resources (cores and memory) and if available, grant Reporting manager (Yarn Application Master) permission and provides the departments (containers) the resources are available and also mark as that the department is allocated for the reporting manager ((Yarn Application Master) to access.
  6. Now(Yarn Application Master) to reach out  (Container request) to the supervisors (Node Manager) who actually manage the resources of each department to provide the resources (Cores, CPU) to get the work (Job) completed. Supervisor (node manager) grant the resources. As soon as the resources are allocated, supervisors (Node Manager) notifies the resource manager so that the resource manager (Yarn resource manager) does not double book the resources.
  7. Reporting manager (Yarn Application Master) got the resources and makes sure it owns the responsibility to monitor the resource allocation till the contract expires. Now onwards, Application Master (reporting manager) DIRECTLY connects with the clients since they have all they need. If any of the employee is not well, it is their responsibility to report back to resource manager (Yarn resource manager) for a replacement resource BUT WITHOUT let the client knows you are scrambling around and failing. Your organization goal is to provide services to the client and manage dynamics without affecting client’s requests. Hadoop platform is too good in doing that with high level of fault tolerance.
  8. After all the works are completed and client contract is done; application master (reporting manager) notifies resource manager of resource release (container releases) and also self release (unregister) so that they can be available.
  9. Now resource are back to the pool and happy go.

 

YARN ALONE

Let us have the same sequence technically now.

  1. A client program submits the application
  2. ResourceManager allocates a specified container to start the ApplicationMaster
  3. ApplicationMaster, on boot-up, registers with ResourceManager
  4. ApplicationMaster negotiates with ResourceManager for appropriate resource containers
  5. On successful container allocations, ApplicationMaster contacts NodeManager to launch the container
  6. Application code is executed within the container, and then ApplicationMaster is responded with the execution status
  7. During execution, the client communicates directly with ApplicationMaster or ResourceManager to get status, progress updates etc.
  8. Once the application is complete, ApplicationMaster unregisters with ResourceManager and shuts down, allowing its own container process

Screen Shot 2017-10-05 at 11.53.40 AM

Picture Courtesy:- Apache.org

COMMON LIFE ALONE

Let us have the same sequence without technology now.

  • A client submits the request
  • Central Resource Manager allocates a specified Reporting Manager to take care of the request
  • Reporting Manager acknowledge with Central ResourceManager
  • Reporting Manager negotiates with Central ResourceManage for appropriate resources
  • On successful resource allocations,Reporting Manager contacts department supervisors to provide the resources
  • Resources allocated to the client, and thenReporting Manager is responded with the status
  • During client engagement, the client communicates directly with the Reporting Manager or ResourceManager for progress updates etc.
  • Once the client engagement is complete,Reporting Manager unregisters with ResourceManager and make himself/herself available in addition to other resources.

I don’t know if whether my blog simplified or complicated, I had a fun to learn and remember atleast. I also wanted to thanks ALL of the apache committers, authors, bloggers and every one who has helped me to learn these concepts to service my clients, my company and the community.

https://www.linkedin.com/in/manikandasamy/

BTW,

  1. While every caution has been taken to provide my readers with most accurate information and honest analysis, please use your discretion before taking any decisions based on the information in this blog. Author will not compensate you in any way whatsoever if you ever happen to suffer a loss/inconvenience/damage because of/while making use of information in this blog.
  2. If you like it or dislike, post your comments. Former motivates me to share more to our community and later helps me to learn from you
  3. Pardon me for the grammar mistakes

Leave a comment