Monthly Archives: March 2014

File operations in HDFS using java

I am using HDP for windows ( single node and Eclipse as development environment. Below are few samples to read and write to HDFS.

  • Create a new Java Project in Eclipse.
  • In Java Settings go to Libraries and add External JARs. Browse to Hadoop installation folder and add below JAR file.Hadoop-core.jar
  • Go into lib folder and add below JAR files.common-configuration-1.6.jar

Continue reading

Quick notes on YARN (Hadoop 2.0)

Problems we had before YARN:

  • JobTracker is solely responsible for handling resources and tasks progress.
  • Scalability Limitation: Maximum cluster size is 4000
  • Maximum concurrent task is 40,000
  • On failure in one job execution: Kills the complete job queue. User needs to resubmit all the jobs.
  • Restarting is complex.
  • Low resource utilization because no flexibility in sharing and allocation of cluster resources.
  • Supports only map reduce. Other iterative application implemented using map reduce is very slower.

Continue reading