Hadoop Blog: Hadoop Interview questions

Friday, November 26, 2010

Hadoop Interview questions - Part 2

Q11. Give an example scenario where a cobiner can be used and where it cannot be used

There can be several examples following are the most common ones

- Scenario where you can use combiner

Getting list of distinct words in a file

- Scenario where you cannot use a combiner

Calculating mean of a list of numbers

Q12. What is job tracker

Job Tracker is the service within Hadoop that runs Map Reduce jobs on the cluster

Q13. What are some typical functions of Job Tracker

The following are some typical tasks of Job Tracker

- Accepts jobs from clients

- It talks to the NameNode to determine the location of the data

- It locates TaskTracker nodes with available slots at or near the data

- It submits the work to the chosen Task Tracker nodes and monitors progress of each task by receiving heartbeat signals from Task tracker

Q14. What is task tracker

Task Tracker is a node in the cluster that accepts tasks like Map, Reduce and Shuffle operations - from a JobTracker

Q15. Whats the relationship between Jobs and Tasks in Hadoop

One job is broken down into one or many tasks in Hadoop.

Q16. Suppose Hadoop spawned 100 tasks for a job and one of the task failed. What will hadoop do ?

It will restart the task again on some other task tracker and only if the task fails more than 4 (default setting and can be changed) times will it kill the job

Q17. Hadoop achieves parallelism by dividing the tasks across many nodes, it is possible for a few slow nodes to rate-limit the rest of the program and slow down the program. What mechanism Hadoop provides to combat this

Speculative Execution

Q18. How does speculative execution works in Hadoop

Job tracker makes different task trackers process same input. When tasks complete, they announce this fact to the Job Tracker. Whichever copy of a task finishes first becomes the definitive copy. If other copies were executing speculatively, Hadoop tells the Task Trackers to abandon the tasks and discard their outputs. The Reducers then receive their inputs from whichever Mapper completed successfully, first.

Q19. Using command line in Linux, how will you

- see all jobs running in the hadoop cluster

- kill a job

- hadoop job -list

- hadoop job -kill jobid

Q20. What is Hadoop Streaming

Streaming is a generic API that allows programs written in virtually any language to be used as Hadoop Mapper and Reducer implementations

Q21. What is the characteristic of streaming API that makes it flexible run map reduce jobs in languages like perl, ruby, awk etc.

Hadoop Streaming allows to use arbitrary programs for the Mapper and Reducer phases of a Map Reduce job by having both Mappers and Reducers receive their input on stdin and emit output (key, value) pairs on stdout.

2 comments:

VaibhavApril 30, 2011 at 1:25 PM
I guess the answer to 11 can be generalized as in where the operation in the Mapper and Reducer is commutative.. that would be more precise.
ReplyDelete
Replies
UsernameMay 11, 2011 at 10:51 PM
Yes, thats true. Thanks for the feedback Vaibhav.
ReplyDelete
Replies

Add comment