File operations in HDFS using java

Tweet about this on TwitterShare on LinkedInShare on Google+Share on Facebook

I am using HDP for windows (1.3.0.0) single node and Eclipse as development environment. Below are few samples to read and write to HDFS.

  • Create a new Java Project in Eclipse.
  • In Java Settings go to Libraries and add External JARs. Browse to Hadoop installation folder and add below JAR file.Hadoop-core.jar
  • Go into lib folder and add below JAR files.common-configuration-1.6.jar
    common-lang-2.4.jar
    common-logging-api-1.0.4.jar

Above image shows the needed external JARS in the build path and their locations.

  • Create a new Package under src and name it HDFSFileOperation.
  • Create a new class in the HDFSFileOperation package. Name it Operations.
  • Import below packages.

import org.apache.hadoop.conf.Configuration;
//Needed to get the hadoop configuration.

import org.apache.hadoop.fs.*;
//Needed for HDFS file system operation.

import java.io.*;
//Needed for system input output operation.

  • Code for accessing HDFS file system
public static void main(String[] args) throws IOException {
         FileSystem hdfs =FileSystem.get(new Configuration());Path homeDir=hdfs.getHomeDirectory();

//Print the home directory

System.out.println(“Home folder -” +homeDir);

}

  • Add below code For creating and deleting directory
Path workingDir=hdfs.getWorkingDirectory();Path newFolderPath= new Path(“/MyDataFolder”);

newFolderPath=Path.mergePaths(workingDir, newFolderPath);

if(hdfs.exists(newFolderPath))

{

hdfs.delete(newFolderPath, true); //Delete existing Directory

}

hdfs.mkdirs(newFolderPath);     //Create new Directory

  • Code for copying File from local file system to HDFS
Path localFilePath = new Path(“c://localdata/datafile1.txt”);Path hdfsFilePath=new Path(newFolderPath+”/dataFile1.txt”);

hdfs.copyFromLocalFile(localFilePath, hdfsFilePath);

  • Copying File from HDFS to local file system
localFilePath=new Path(“c://hdfsdata/datafile1.txt”);hdfs.copyToLocalFile(hdfsFilePath, localFilePath);

  • Creating a file in HDFS
Path newFilePath=new Path(newFolderPath+”/newFile.txt”);hdfs.createNewFile(newFilePath);

  • Writing data to a HDFS file
StringBuilder sb=new StringBuilder();for(int i=1;i<=5;i++)

{

sb.append(“Data”);

sb.append(i);

sb.append(“\n”);

}

byte[] byt=sb.toString().getBytes();

FSDataOutputStream fsOutStream = hdfs.create(newFilePath);

fsOutStream.write(byt);

fsOutStream.close();

  • Reading data From HDFS File
BufferedReader bfr=new BufferedReader(new InputStreamReader(hdfs.open(newFilePath)));String str = null;

while ((str = bfr.readLine())!= null)

{

System.out.println(str);

}

You can run the code directly from Eclipse if there are no errors in the code but practically we need a JAR file to run it in Hadoop. For creating a JAR file and using it in Hadoop follow below steps:

  • Right click on the project and select Export.
  • Select JAR file under Java and click next.
  • Provide the location where you want the JAR file to be exported. And click Finish

  • To execute it in Hadoop. Use below Hadoop command.

hadoop jar [mainClass]
e.g. hadoop jar c:\Users\Administrator\Documents\fileoperations.jar HDFSFileOperation.Operations

You can see output like below if it executes without any error.

You can check the created file by browsing HDFS file system in web browser.

The complete code of Class file is given below

package HDFSFileOperation; 

import java.io.*;

import org.apache.hadoop.conf.Configuration;

import org.apache.hadoop.fs.*;

 

public class Operations {

public static void main(String[] args) throws IOException {

FileSystem hdfs =FileSystem.get(new Configuration());

//Print the home directory

System.out.println(“Home folder -” +hdfs.getHomeDirectory());

// Create & Delete Directories

Path workingDir=hdfs.getWorkingDirectory();

Path newFolderPath= new Path(“/MyDataFolder”);

newFolderPath=Path.mergePaths(workingDir, newFolderPath);

if(hdfs.exists(newFolderPath))

{

//Delete existing Directory

hdfs.delete(newFolderPath, true);

System.out.println(“Existing Folder Deleted.”);

}

hdfs.mkdirs(newFolderPath);     //Create new Directory

System.out.println(“Folder Created.”);

 

//Copying File from local to HDFS

Path localFilePath = new Path(“c://localdata/datafile1.txt”);

Path hdfsFilePath= new Path(newFolderPath+”/dataFile1.txt”);

hdfs.copyFromLocalFile(localFilePath, hdfsFilePath);

System.out.println(“File copied from local to HDFS.”);

//Copying File from HDFS to local

localFilePath=new Path(“c://hdfsdata/datafile1.txt”);

hdfs.copyToLocalFile(hdfsFilePath, localFilePath);

System.out.println(“Files copied from HDFS to local.”);

//Creating a file in HDFS

Path newFilePath = new Path(newFolderPath+”/newFile.txt”);

hdfs.createNewFile(newFilePath);

//Writing data to a HDFS file

StringBuilder sb = new StringBuilder();

for(int i=1;i<=5;i++)

{

sb.append(“Data”);

sb.append(i);

sb.append(“\n”);

}

byte[] byt = sb.toString().getBytes();

FSDataOutputStream fsOutStream = hdfs.create(newFilePath);

fsOutStream.write(byt);

fsOutStream.close();

System.out.println(“Written data to HDFS file.”);

//Reading data From HDFS File

System.out.println(“Reading from HDFS file.”);

BufferedReader bfr = new BufferedReader(

new InputStreamReader(hdfs.open(newFilePath)));

String str = null;

while ((str = bfr.readLine())!= null)

{

System.out.println(str);

}

}

}

2 thoughts on “File operations in HDFS using java

  1. Hello,
    I am trying to read contents from file on hdfs. When I run the code then no error persists but nothing is display .normal system.out.println(“hello world”)

  2. Hi,

    I am trying to run my code for accessing HDFS using “java -jar” (without using the “hadoop jar” command). However, I can’t get the correct hdfs home directory. Does your example work using “java -jar”?

Leave a Reply

Your email address will not be published. Required fields are marked *


6 × seven =