Monthly Archives: March 2014

File operations in HDFS using java

I am using HDP for windows (1.3.0.0) single node and Eclipse as development environment. Below are few samples to read and write to HDFS.

  • Create a new Java Project in Eclipse.
  • In Java Settings go to Libraries and add External JARs. Browse to Hadoop installation folder and add below JAR file.Hadoop-core.jar
  • Go into lib folder and add below JAR files.common-configuration-1.6.jar
    common-lang-2.4.jar
    common-logging-api-1.0.4.jar

Continue reading

Quick notes on YARN (Hadoop 2.0)

Problems we had before YARN:

  • JobTracker is solely responsible for handling resources and tasks progress.
  • Scalability Limitation: Maximum cluster size is 4000
  • Maximum concurrent task is 40,000
  • On failure in one job execution: Kills the complete job queue. User needs to resubmit all the jobs.
  • Restarting is complex.
  • Low resource utilization because no flexibility in sharing and allocation of cluster resources.
  • Supports only map reduce. Other iterative application implemented using map reduce is very slower.

Continue reading